How to Never Lose Wordle
For this recreational project, I created a program which provides a best guess for your wordle game. Creating an effective, time-efficient wordle solver is a non-trivial problem. Put in different terms we need the program to provide a guess that will, on average, eliminate the most words from our list of possible answers. Doing this directly would involve checking each possible guess, by cycling through each possible final answer, determining the number of words this final answer would eliminate from our list of possible final answers, then averaging this value for every final guess. We would have to do this for every possible guess, and you can quickly see the problem here. The number of words that can be a solution to the daily wordle puzzle is greater than two thousand. The algorithm described above has an n^3 growth rate, meaning for our first guess, we would have billions of words to compare. To fix this problem we must narrow our search.
When looking over the list of possible guesses we can likely skip over words like ‘jazzy’. This word has uncommon and repeated characters both of which likely won’t help with eliminating possible final answers. How do we quantify this feeling of a word being not-helpful, and more importantly how do we create an adaptive definition of not-helpful. It is likely that in at least one game scenario the word ‘jazzy’ would make the perfect guess.
Here was my approach. Using our list of possible answers we create a 26 by 5 array. Each entry of the array corresponds to the number of uses of that particular letter in that particular spot in the whole list of final answers. For example if the entry [13, 3] is 51 then that means that ‘m’ (the 13th character of the alphabet) appears as the third letter of 51 words in our list of possible solutions. Now that we have this data, how can it be useful? Well consider a word like jazzy. How valuable is this word given our matrix? Let's just consider the J starting the word. Using our matrix we can count how many words don’t start with a ‘J’ all of these words will be eliminated by our guess. Also any word that doesn’t contain a ‘J’ in the four other possible positions will also be eliminated.
This algorithm is not perfect. There are multiple ways in which the word with the highest score could not actually be the best guess; thus, we create a new list of the 20 or so words with the highest scores. Through many tests, I observed that this list would always have the best guess. Thus we can run our check on these twenty words rather than every word decreasing our growth rate from an unreasonable n^3 to a very manageable n^2.
Now that we have chosen our best first guess, we enter wordle's output. One example would be the string GGBBY. This would represent if the first two letters being in their correct position, the third and forth being incorrect, and the fifth being in the wrong position. We then scrub through our list of possible final answers an eliminate any that do not follow this pattern, and repeat our previous steps with our revised list. This method lead to a very efficient algorithm that can solve any wordle game in an average of 3.3 guesses. If I were to further develop this project I would want to look into how looking beyond a single guess will effect the efficiency. It is possible that there are sometimes indirect guessing paths, where the first guess might not be 'optimal' by the defintion I have set here, yet will ultimately lead to a lower guess rate on average. I have not thought of a way of implementing this without drastically decreasing efficiency.