Jonathon Makepeace Matthew Harris Jamie Sparrow Julian Hillebrand ISudoku Abstract In this paper, we will analyze and discuss the Sudoku puzzle and implement different algorithms to solve the puzzle. After analyzing each algorithm, we will determine the efficiency of each and determine which is the best implementation. In this paper, we will state our methods to test each algorithm and list our constraints that will lead to our conclusion. 1. Introduction Sudoku is a puzzle game often found in newspapers or magazines. The object of the game is to place numbers in a grid such that certain conditions are met. On paper, many people can solve a Sudoku puzzle given enough time and it wouldn t be too difficult. However, some puzzles can be too difficult for any human to solve. In this paper, we will analyze four algorithms for computing solutions to Sudoku puzzles. 2. Informal Statement of Problem Traditionally, the object of Sudoku is, given a minimum of 17 clue numbers already placed in the cells of a 9 9 grid subdivided into 3 3 regions, place the integers 1-9 in each empty cell such that no row, column, nor region of cells contain a duplicate digit. This definition is not strict, however; other symbols may be used in place of integers, the number of unique symbols may vary, and a puzzle does not need to be 9 9 size. However, for our implementation purposes, we focused on the traditional 9 9 puzzle grid. 3. Formal Statement of Problem The problem definition of solving a Sudoku puzzle can be expressed formally as a constraint satisfaction problem with three constraints. The first constraint requires that all numbers 1 through n be placed in a row with n cells; the second and third constraints require the same for the puzzle s columns
and regions, respectively. Given an n n Sudoku grid, the Sudoku is successfully solved if: x N and i N, with x n and i n: j N (with j n) such that numberat(i, j) = x. ( For any given number x [between 1 and n] and row number i [between 1 and n], there is a column number j [between 1 and n] such that x can be found in row i, column j. ) x N and j N, with x n and j n: i N (with i n) such that numberat(i, j) = x. ( For any given number x and column number j, there is a row number i...) x N and r N, with x n and r n: i N (with i n) and j N (with j n) such that regionof(i,j) = r AND numberat(i,j) = x. 4. Backtracking A common naive algorithm for solving a Sudoku puzzle is brute-force backtracking. For an n n puzzle board, backtracking generates all possible configurations of n symbols (most commonly, represented by the integers in the range [1, n ]) to fill each empty cell. Each of these configurations is tested until a solution to the puzzle is found. The idea of backtracking is to exhaust all possibilities in a large amount of candidates and pick the first answer that is valid. Using recursion, the algorithm generates a depth-first tree of possibilities that grows exponentially with each blank cell to be filled. Our recursive implementation of backtracking in Python is shown in Figure 4.1. def backtrackingsolve( self ): c = None for cell in self.cells: # Iterate over each cell in the puzzle if cell.num is None : c = cell # Work with the first blank cell found. break if c is None : # If no empty cell is found, the puzzle is solved return True for candidate in range ( 1, 10 ): if c.isvalidcandidate(candidate): c.num = candidate if self.backtrackingsolve(): # Recursively branch down possibilities for cell c return True c.num = None return False Figure 4.1 By default, backtracking is not an efficient algorithm due to its naive implementation. At each empty cell on an n
n board, backtracking takes at worst O( n ) attempts to find a valid symbol for that cell that does not violate any of the three constraints (iterates through every symbol). In the best case, the first symbol tested in any empty cell (in our case, the integer 1) happens to be a valid candidate (does not violate any of the three constraints), and the cell is filled in O(1) constant time. After a valid candidate for the empty cell is found, it is placed in the cell. The algorithm branches down the decision tree at that cell and recursively attempts to find a valid symbol for the next empty cell, backtracking to the previous cell if one is not found. Therefore, in the worst case, each subsequent empty cell must work through n possibilities for each of the n possibilities of the previous cell (a total of n 2 possibilities for two empty cells). The complexity thus increases exponentially for every empty cell in the puzzle. The average complexity of the backtracking algorithm is then O( n m ), where m is the initial number of empty cells (or n 2 - (# of clue numbers)). Figure 4.2 Figure 4.2 shows the increase in average run times for the backtracking algorithm when it is given 10 puzzles of each of 3 difficulties. The easy, medium, and hard puzzles tested initially contain 40, 33, and 26 clue numbers, respectively. 5. Dancing Links Dancing Links is not the easiest method one could implement to solve a sudoku puzzle. However, the ingenious strategy, developed by Donald Knuth, applies perfectly to this type of problem. He developed this technique to implement his Algorithm X, which is used to solve Exact Cover problems.. An Exact Cover problem requires one solution to satisfy all given constraints exactly once, like our Sudoku puzzle. When applying Algorithm X, this requires a double linked list of all
possibilities and an efficient use of backtracking. When creating the double linked list, it is set up as a sparse matrix, which creates a node for each non-zero entry and links it to its neighbors of the same row and column. Then, when running the algorithm, we begin by covering a node and then covering all other nodes in its set, or web of linked nodes. If there are nodes that can t be covered by the set, it is not the solution, and our algorithm backtracks by uncovering the nodes in the reverse order of how they were covered. If there are no remaining nodes however, then our set must be a solution. Figure 4.3 Dancing Links has a complexity of O( n 3 ) or O(n*n*n) as it is essentially sorting through a grid or list (n) of double linked lists (n*n). As the puzzle difficulty increased, the overall complexity of the algorithm did not. Of course, the specific set that would be the solution became more difficult to discover, which led to an increase in run time. 6. Crook s Algorithm One of the most simple and easy to understand algorithms to use when solving the Sudoku puzzle was developed by J.F. Crook. Crook s algorithm was designed to be a pen and paper method to solve the puzzle and can be accomplished by anyone given enough time. It s very important to look at the definition when solving a Sudoku puzzle, as it will provide insight on how to find the answer. The solution of a Sudoku puzzle requires that every row, column, and box contain all the numbers in the set [ 1, 2,..., 9 ] and that every cell be occupied by one and only one number( Crook Source). Meaning that only one copy of each digit can be in every set, whether it be vertical or horizontal. Using Crook s algorithm, we must keep a preemptive set of possible values that could be be inside the cell. For each cell, a mark-up is created to determine the possible values, on pen and paper it is used for visual
reference to keep track of the values in the game. After determining the markup, the user must pick a cell to start with to begin the process. The next step can requires the user to pick a number out of the markup values. Doing this will repeat the previous mentioned step, where you go through each cell on the row and column and take out the entered value from the possible markup values. Repeat the process till all the solutions have been determined and qualify for a solution based on the Sudoku puzzle definition. These steps can further be simplified to the following steps: 1. Mark up the cells ie. list of numbers that the cell may contain. 2. Look at each column, row and 3x3 box and break down into preemptive sets. Break down the sets and use 1 Occupancy Theorem whenever possible. 3. Determine if puzzle is: Finished, not possible or next step. 4. Choose an empty cell and mark a number and color. Repeat step 2 1 Let X be a pre-emptive set in a Sudoku puzzle markup. Then every number in X that appears in the markup of cells not in X over the range of X cannot be a part of the puzzle solution. until no more preemptive sets remain. 5. Go through the markup until you solve or determine as not possible. Figure 4.4 Crook s algorithm is basically a trial and error type of algorithm. After initializing the markup, it goes through each possibility until it finds a solution. It can be considered an exhausted search, but it has different elements associated with it. We determined that the big O of this algorithm is N^2 for the average case, because the algorithm must sort through two different sets to reach the markup in each cell. The complexity increases at a rate that is near that of backtracking. There is some slight deviations in the data, where it takes more time to solve easy puzzles compared to the other algorithms. This could be caused by the design of the algorithm, being that it
has to go through the possibilities until it finds the first solution. As mentioned previously in the backtracking section, as the difficulty increased, so did the amount of time exponentially increase. The main cause of this is from the increase of blank spaces present as the difficult goes up. There were also factors that affect the amount of possible answers in the final solution to a puzzle. Some instances of the puzzle only had a single solution, where these types of puzzles are called diabolical. Each number has to be in a specific spot in the puzzle for the solution to be found. Crook s algorithm would take an exceedingly long amount of time compared to other puzzles (as seen in Figure 4.4), because of the amount of sets each iteration would have to go through and validate for correctness. Genetic Algorithm A genetic algorithm is a general way to solve optimization problems. The basic algorithm is very simple: 1. Create a population (vector) of random solutions (represented in a problem specific way, but often a vector of floats or ints) 2. Pick a few solutions and sort them according to fitness 3. Replace the worst solution with a new solution, which is either a copy of the best solution, a mutation (perturbation) of the best solution, an entirely new randomized solution or a cross between the two best solutions. These are the most common evolutionary operators, but you could dream up others that use information from existing solutions to create new potentially good solutions. 4. Check if you have a new global best fitness, if so, store the solution. 5. If too many iterations go by without improvement, the entire population might be stuck in a local minimum (at the bottom of a local valley, with a possible chasm somewhere else, so to speak). If so, kill everyone and start over at 1. 6. Go to 2. Fitness is a measure of how good a solution is, lower meaning better. This measure is performed by a fitness function that you supply. Writing a fitness function is how you describe the problem to the GA.
The magnitude of the fitness values returned does not matter (in sane implementations), only how they compare to each other. There are other, subtly different, ways to perform the evolutionary process. Some are good and some are popular but bad. The one described above is called tournament selection and it is one of the good ways. Much can be said about the intricacies of GA but it will have to be said somewhere else, lest I digress completely. And within a few minutes (about 2.6 million iterations when I tried), the correct answer pops out! The nice thing about this method is that you do not have to know anything about how to solve a Sudoku puzzle or even think very hard at all. Note that I did not even bother to just let it search for the unknown values - it also has to find the digits that we already know (which should not be too hard with a decent fitness function, see below). The only bit of thinking we did was to understand that a Sudoku solution has to be a permutation of [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9], but this merely made the evolution part faster. If we wanted to make it faster still, we could make a genome type that let us say that there are actually nine separate vectors who are each guaranteed to be a permutation of 1 to 9. We could have thought even less and represented the solution by 81 ints who are all in the range 1 to 9, by using another genome type: >> genome = stdgenomes.enumgenome(81, range(1,10)) The range argument to EnumGenome does not have to be a vector of integers, it could be a vector of any objects, since they are never treated like numbers. In my experiment this took maybe 15-30 minutes to solve. For more difficult Sudoku puzzles, I would definitely go with the permutation genome, since using EnumGenome increases the search space to 9^81 possible solutions. 7. Conclusion Our initial estimates concluded that Dancing Links would be the most efficient
method out of the ones listed. But with the complexity found to be larger than that of the other algorithms, we grew unsure. Looking at the data, it actually seems to be the slowest of the algorithms. Backtracking and Crooks algorithms, per the data, are quite similar. Though their algorithms require a thorough check of all empty cells and possibilities, Dancing Links starts with all possibilities linked together in a large matrix, and runs through sets cutting down those possibilities. cking-6613d33229af [Accessed 7 Dec. 2018]. 8. References Crook, J. (2009). A Pencil-and-Paper Algorithm for Solving Sudoku Puzzles. [PDF file] Retrieved from https://www.ams.org/notices/200904/tx0904 00460p.pdf Knuth, D. (2000). Dancing Links. [PDF file] Retrieved from https://arxiv.org/pdf/cs/0011047v1.pdf Zibbu, Shirsh. (2018). Sudoku and Backtracking - Hacker Noon. [online] Available at: https://hackernoon.com/sudoku-and-backtra