Monte Carlo Simulation : Estimate Percolation Threshold In Java

Engineering
Sabin
by Sabin 

As I was in the verge of completing my first week in Algorithm 1 by Princeton University in Coursera a name popped in my screen . It was, “Monte Carlo Simulation”. I pressed pause and thought, what the hell on earth is this nonsense!. Yes, that was the exact same reaction when I saw this name.

Being a student of management I only had hopes when I stared my journey towards learning Artificial Intelligence, (Which I successfully completed and have few projects , articles on my name till date) But then again, I grew this new passion for learning algorithms and data-structure.

Percolation Visualization

Okay! Enough about me let’s start coding!!! And one more thing Here, I have considered that you have prior knowledge of following topics:

  1. Dynamic Connectivity
  2. Graph
  3. Union-Find (Quick union , Quick Find)
  4. Weighted Union Find
  5. Path Compression

If not then you should probably consider visiting them first. As usual you can find source code at my repository here. In addition, Princeton University provides all the necessary dependencies or directory structure on this link.

To make it simpler I will be dividing the explanation part according to two different problem statement:

  1. Percolation
  2. Monte Carlo Simulation

Percolation the general meaning for that is the movement of water through pores on any surface. So what are we trying to concur ? The motivation behind this is to create a matrix of N * N size representing a surface and opening random spots on the surface until the surface leaks. Still not clear?

Don’t worry, I got you. So what we basically do is we create a boolean array of N * N size or in other words with N number of rows and N number of columns, Each cell representing a pore on the surface. So what we need to determine is the percolation threshold or what is the average number of pores required in the surface so that the water leaks from top to bottom.

As in the above shown GIF it represents a surface of size 50 by 50. The black square things are blocked pores , white are open pores , and the squares with blue color represent the flow of water.

You might ask Why do we need this?! Take it as you are doing an experiment where you are mixing different types of metals together to build a new alloy and the quality of that alloy is tested by passing water through it. It would take gazillion man hours if you were to do it physically , So to save time you run it on a computer and see what combinations gives best results. Now that the Why part is answered lets start with coding. Unlike my other articles, This problems implementation is done using Java so I consider you having a little knowledge of Java too.

CODE

As a requirement by the course we create a class named Percolation with following properties:

private boolean[] grid;      
                   
private final WeightedQuickUnionUF gridMap;  
                         
private final int n;                           
private int openCell;              
             
private final int top ;       
                    
private final int bottom;

grid represents the surface part like I explained previously . The grid is filled with either true or false. True being it is open and False being it is blocked.

gridMap represents the weighted quick union structure which I consider you have some prior knowledge. If not consider this as an mechanism to find path among the grid so that we can find path between any given two points inside the grid.

openCell this variable will be used to count the number of open cell or open pores in the surface as we randomly open them.

top and bottom are extra added structures so that we can map if top and bottom have an open path in between . This helps us to draw a conclusion that the surface leaks if any path is found. Here we initialize(top and bottom ) them as starting and ending indexes . So while initializing the gird and grid Map we size them as N * N + 2 for added top and bottom.

public Percolation(int n) {      
                         
   if (n <= 0) throw new IllegalArgumentException();                                    
   gridMap = new WeightedQuickUnionUF(n * n + 2);                               
   grid = new boolean[n*n+2];                              
 
   this.n = n;                               
   openCell = 0;                               
   top = 0;                               
   bottom = n * n + 1;                           
}

As a rule we want our N to be greater than 0 since no structure have negative dimension. As mentioned before we initialize the size of grid and grid map as N * N + 2. Top and Bottom are initialized as the starting and ending index.

private void checkRowsColumn(int row, int col) {                               
if (row < 1 || row > this.n || col < 1 || col > this.n)                                  
throw new IllegalArgumentException();    
                                                    
}

checkRowsColumn function just throws an Illegal Arguement Exception whenever rows and columns are below 1 or above N, If not checked it will give array index out of bound exception later in the program. So to prevent that from happening we have a restriction layer.

private int indexOf(int row , int col){                               
checkRowsColumn(row,col);                            
  
return (row - 1) * n + (col -1);                          
 }

In indexOf method we take row and column and return the index of that particular intersecting point. For example if row = 2 and col = 2 the index for that cell will be 3 , because array’s index starts from 0 (Excluding the logic for Top and Bottom).

public boolean isOpen(int row , int col)                           {                  
            
checkRowsColumn(row , col);                               
return grid[indexOf(row ,col)];                           
}

isOpen function is self explanatory. It checks whether or not the given cell is true or false. True meaning water can pass and False means it resists water.

public boolean isFull(int row , int col)              
{                               
checkRowsColumn(row, col);                               
if(!isOpen(row,col)) return false;                                                       
return gridMap.find(indexOf(row,col)) == gridMap.find(top);                           
}

Similarly isFull checks if the point or cell has a path to the top , Since top is the source of water, If any path exists between the top and bottom cell, water flows through that path.

public int numberOfOpenSites(){                              
return this.openCell;                           
}

This function just returns the total number of pores open or total number of cells containing true values.

public boolean percolates(){                               
return gridMap.find(top) == gridMap.find(bottom);                           
}

A surface is considered to percolate if any path exists between top and the bottom.

public void open(int row, int col) 
{                               
 ........................                                                   
}

So the longest and the hardest logic hides in the open function. Initially the surface does not percolates. We randomly open a pore or initialize a particularly cell to be true and check if path exists between top and bottom. In this function we take two parameter row and col which determines the cell to be opened. The random opening is tackled further in Monte Carlo implementation. For now we will be gazing at the logic behind percolation.

checkRowsColumn(row,col);
int current_cell_index = indexOf(row, col);

Initially we check the given row or column is within the range. If it passes the range, we then determine the index of the cell to be opened. As mentioned earlier the grid is an boolean array so if the value in that index is true then it is considered open. So if the value at the cell is false we try to open that cell. It is not that simple . The algorithm does not work if we only replace false value with true . There are many post condition after we replace false with true.

grid[current_cell_index] = true;                                   this.openCell ++ ;

After we replace false value with true we increase the number of open cell by one. The main motivation to find if the surface percolates is to find a path between top and bottom. So when we open a cell in the surface we need to check various conditions. We need to check if the adjacent cell are open or not. If the adjacent cell are open then there exists a path. We then map that path until and unless it has no where to go or leads to an ending.

if(row == 1) gridMap.union(current_cell_index,top);

The first condition that we check is if the index of the cell lies on the first row. Since every cell at the top is connected with the root in weighted quick find with path compression or in other words the top of the surface has water in it so if there is any hole present the water flows through that hole. To tackle this particular situation what we do is we create a path between top and the cell that we had recently open which imitates the flow of water.

To give a brief on what is path compression think it as a method of recreating a given tree structure where the nodes are arranged in such a way that it takes minimum number of edges to reach a leaf node.

if(row == this.n) gridMap.union(current_cell_index,bottom);

Coming back to the second step, We here check if recently opened cell lies in the bottom row or not . If it does then we connect that opened cell to the bottom cell . Consider the bottom cell as a vessel used to collect water if the surface leaks.

Above

if(row > 1 && isOpen(row - 1 ,col))
{                                       
assert(current_cell_index > n);                                       
gridMap.union(current_cell_index,current_cell_index - n);                                   
}

Now what if recently opened cell is neither on top nor at the bottom. This is where we check adjacent cells. Since recently opened cell might have open cell on its top , bottom , right , left. We check all the conditions. In the above given code it checks if a cell is open on its top i.e (row — 1 row before current row) . If the condition matches we create a link between the cell on top of it. So if there is water present on its top it flows through it.

Below

if (row < this.n && isOpen(row + 1 , col))                                   
{                                       
assert(current_cell_index + n < n * n);                                       
gridMap.union(current_cell_index , current_cell_index + n);                                   
}

Similarly we check the cell on its bottom. If the cell below it is open we create a union between them meaning we create a path between newly opened cell and already open cell below it.

Left

if(col > 1 && isOpen(row, col - 1))                                  
{                                       
gridMap.union(current_cell_index,current_cell_index- 1);                                   
}

Right

if(col < this.n && isOpen(row, col + 1))                                   
{                                       
gridMap.union(current_cell_index,current_cell_index + 1);                                   
}

And in the same way we check potential path between preceding and exceeding cell. Simply if cell on its right or left is open we create a path in between.

So now we have successfully created a model to represent a surface which can determine if the surface leaks or not. As you can see we do not run this model by itself. To visualize this Princeton have included PercolationVisualization.java file.

Monte Carlo Simulation

Our percolation model is just an abstract structure to represent a surface. The exact analysis of the event(percolation) is carried out by simulating the occurrence of leak over and over again. Now lets get going on the simulation part.

The first question you might ask is why do we need this analysis. You might say we already have a model , we just run it and check if the surface leaks or not!

You are not wrong but not exactly correct either. Since we open a cell uniformly at random the occurrence of a leak varies. So we need a concrete analysis on the average return(in this case finding a leak) , standard deviation , highest confidence level and lowest level.

So Monte Carlo method is a mathematical technique to estimate the possible outcome of an uncertain event. The fraction of the cell that are opened when the system percolates gives an estimation of the percolation threshold. So our task it to see what is the number of open cell required so that the surface percolates.

All the mathematical equation are given by Princeton on this page. As they mention we need to find the average , standard deviation , low endpoint of 95% confidence interval and finally high end point of 95% confidence interval. To do this we initialize 4 variables namely:

private final double mean ;
private final double sd;                           
private final double highConfidence;
private final double lowConfidence;

The main logic behind this simulation is that we initialize a surface with size N check if it percolates with respect to randomly opened cells and repeat this process for a given number of times.

if(n <= 0 || trials <= 0) throw new 
IllegalArgumentException();

Initially we check if the given size ( i.e n) and trials (i.e number of repetition) is greater than 0. If the given parameters are valid we create 3 variables.

  1. Array to store the result namely results
double[] results = new double[trials];
  1. Integer variable to store randomly generated row number
int testY;
  1. Integer variable to store randomly generated column number
int testX;

After initializing the required variable we loop over trail times.

for(int i = 0 ; i < trials;i ++) { ---------- }

Initializing the percolation model that we built earlier with size n.

Percolation per = new Percolation(n);

Now the fun part

while(!per.percolates())                                   
{                                           
     testX = (int)((StdRandom.uniform() * n) + 1);                                              
     testY = (int)((StdRandom.uniform() * n) + 1);                                           
    if(!per.isOpen(testY,testX))                                              
     {                                                 
         per.open(testX,testY);                                           
     }                                   
}                                   
results[i] = (double)(per.numberOfOpenSites()) / (n * n);                               
}

We loop over a point till the surface percolates. Initially the system does not percolates as all the cells are closed. Now we generate a random row and column uniformly at random. After we have new set of row and cols we open that particular cell using open function that we previously created on our percolation model. After the new cell is opened we again check if the surface or model percolates. This process continues until and unless the system percolates. If the system percolates it comes out of the while loop and the number of open sites or cells is divided with its total size i.e (n * n) giving us a fraction or an estimate. Then we store this estimate on an array and repeat this process trails times.

mean = StdStats.mean(results);                               
sd = StdStats.stddev(results);                                                       
lowConfidence =  mean - (1.96 * sd)/ Math.sqrt(trials);                               
highConfidence =  mean + (1.96 * sd)/ Math.sqrt(trials);

After the simulation is carried out trials time , we calculate the averate with the help of library provided by Princeton i.e StdStats.mean().

Similarly we calculate the standard deviation using StdStats.stddev() on the results that we have gathered.

To calculate high and low endpoint of 95% confidence interval we use the formula provided by Princeton’s guide and store them respectively.

Now to run Monte Carlo Simulation all we have to do is initialize an object of PercolationStasts and rest is handled automatically.

PercolationStats s = new PercolationStats(50,10000);
System.out.println("Mean:" + s.mean());
System.out.println("Standard Deviation:"+ s.stddev());
System.out.println("High Confidence" + s.confidenceHi());
System.out.println("Low Confidence" + s.confidenceLo());

Here we check a surface with size 50 for 10000 time . The results looks something like this

Monte Carlo Simulation

Monte Carlo Simulation

Monte Carlo Simulation

This artical was first published in, Sabin Sharma - Medium

Share Article

Few more blogs...