Saturday, January 10

Using Bayes' Theorem to Track Probability Distribution of Minesweeper Grid (Part 2)

This post is part of a series, please see the first post for more background.

A successful attempt by the solver


Developing the Solver

When a game of minesweeper is begun, two pieces of information are known: the number of elements in the grid and the number of mines. From this information, the probability of containing a mine for each element in the grid is




This seems intuitive, but it is actually a special case of a more general form.

The probability of a random event can be expressed as the number of outcomes for which the event occurs divided by the total number of possible outcomes. In the case of minesweeper, the probability of a  particular square containing a mine can be found by dividing the number of configurations of mines that include the square by the number of possible configurations.

The example below illustrates the probability of a mine occupying the square in the top left corner.

Three favorable configurations
---------------------------------------
Six possible configurations
The number of possible configurations of mines has a special name and notation. The expression in parentheses below is pronounced "n choose k". It is equal to the number of distinct k-sized subsets that can be selected from a set of n elements.

Binomial Coefficient
Consider the animations above. The number of favorable configurations at the top would be calculated as "three choose one". Meanwhile, on the bottom, the number of possible configurations is equal to "four choose two". For the purpose of calculating the initial probabilities at the start of the game, they should be equal across the board. For each element, dividing the number of favorable configurations by the total number of configurations  reduces to the intuitive ratio that we started with.


This seems like a lot of unnecessary thought went in to calculating something as intuitive as the initial distribution. However, as the grid becomes populated with measurements, we will have to augment our approach to calculate the new distributions.


Probability Density Functions and Expectation

A probability density function (PDF)
One thing to note: I refer to the set of probabilities as a "probability distribution" throughout these posts in that it is a distribution of probabilities. This phrase is usually used to describe a probability density function (PDF). The probability distribution that I refer to is not a PDF because the sum over all elements is not equal to one. If we were tracking the probability of a particular mine occupying each element in the grid, that could be represented with a PDF. Instead, we are tracking the probability of any mine occupying each element.

If we sum all of the elements in the probability distribution, this is like summing the expectation of the grid. This is because the probabilities under consideration represent boolean states - the usual formula for expectation reduces to a sum over the probabilities. The expectation of the grid will always be equal to the number of mines in the grid, no matter how the probability distribution changes.

We have computed an initial set of probabilities. I'll refer to the set of probabilities at any particular step in the game as the belief. When the first element is selected, assuming the minesweeper implementation is user-friendly, it is guaranteed to be empty. Therefore, the new belief will be uniform and equal to 


We need to incorporate the measurement m into the latest belief. Given a particular measurement at element j, for every ith element in the grid, this is like asking "what is the probability of element i containing a mine given that a measurement m was made at element j?" We can express this with the notation


If this is done for every element in the grid, we have successfully incorporated the new measurement into our belief. The trouble is, how do we actually calculate this conditional probability?

Applying Baye's Theorem

We can start by applying bayes' rule and breaking it down into components.



We already know p(x_i), this is the prior probability that we started with. The term p(M_j = m | X_i = 1) is a bit more nuanced; This is the probability of a measurement m at element j given the assumption that there exists a mine at element i. I'll put that discussion on hold for now in order to address the denominator. 

The computation of the denominator term p(M_j = m) can be ignored. To demonstrate this, observe that the denominator is independent of which element we are evaluating. Every time we apply Bayes' rule, the denominators will be the same for every element in the grid. We can represent the denominator with a constant 1/c.


Remembering that the expectation should always be equal to the number of mines, we can compute c as follows.


Because c is independent of the element in question, it can be pulled out of the summation.


Where
 

Effectively, we can compute intermediate probabilities and then scale them so that their sum is equal to the expectation. This should save a lot of computational effort because calculating p(m) requires permuting over every possible combination of mines.

What's in a measurement?

Returning to the final unknown term p(m | x), how do we calculate the probability of a measurement? To answer this, we need to consider how measurements impact the distribution of mines.

 

A measurement can be viewed as a constraint on the distribution of mines applied to all squares adjacent to the measurement. As more squares are revealed, more measurements and constraints are added to the list.



All possible configurations of mines must satisfy every constraint simultaneously. This constraint satisfaction task can be expressed as a system of linear equations.



Purely deductive solvers operate by identifying which mines exist outside of the solution space. In the above example, considering that there is only a single mine in the grid, it is fairly clear that x_3 is the only valid location for a mine. Therefore, x_1, x_2, x_4, and x_5 are safe locations for the player to reveal.

Consider another situation depicted below:

Fig 942: another situation
The corresponding constraint equation would be:


It is possible to express the total number of mines as part of the constraint matrix by appending a row of ones at the bottom and augmenting the x and b vectors accordingly.


It is not wise to do this for large grids because the constraint matrix would have to be expanded to include every element in the board.

One way to contrast the probabilistic versus constraint-based approach is to look at the types of values that x can represent. For the constraint-based approach, each element of x can take on a boolean value: 0 or 1. Either that particular element contains a mine or it doesn't. With the probabilistic approach, we loosen things up: each element can take on any value between 0 and 1. In the context of the constraint matrix, nothing is changed; the variable x still has to satisfy the constraint equation.

One particular solution to the above equation is


The system admits more solutions than this, however. There is a whole space of solutions that satisfy the constraints. To demonstrate this, consider the null space of M. This is the set of all values of x that yield the zero vector when multiplied by M.

The null space of M is the linear combination of all linearly
independent vectors orthogonal to the row space of M.

Null space basis determined using GNU Octave null() function.


If we add the particular solution above to an element from the null space, we end up with another particular solution.


It follows that all possible solutions to the constraint equation can be expressed as a linear combination of a particular solution and vectors from the nullspace. It is important to recognize that, while these solutions satisfy the constraint matrix, not all of them represent valid probability distributions. Certain solutions may include indices with negative values or values greater than one.

There is one additional observation to make about the constraint matrix. The nature of each individual constraint (each row in the constraint matrix) can be interpreted as a localized expectation. That is, the sum of all probabilities adjacent to a measurement should sum to the measurement. Tying this back to the global constraint, this intersects with the intuition about global expectation at the beginning of this post: the global expectation should be equal to the number of mines hidden in the grid.

p(m | x)

Getting back to the main question: how do we compute the probability of a measurement? This is where I am running in to trouble. My current approach is to use this matrix to permute through every possible configuration of mines that do not violate the constraints. For each configuration of mines, I calculate its probability using the prior distribution p(x_i).


{a} is the set of all mines in the configuration.

I then add together each of those probabilities. For conditional probabilities p(m | x_i), I follow the same process - starting with the assumption that the element i contains a mine.

This approach seems to work for the most part. For a couple of small test cases, I've been able to verify the probabilities by hand. Unfortunately, there are times when my solver ends up incorrectly declaring certain elements to have zero probability. I'm hoping that the problem is with my implementation - maybe I'm not properly handling certain edge cases when I attempt to generate valid configurations. However, I'm beginning to suspect that there is something wrong with my entire approach.

I think my next step will be to attempt to create the hybrid solver that I suggested in the first post. That way I can at least compare the probability distributions that each method generates.

Please feel free to comment if you have any feedback or suggestions.

No comments:

Post a Comment