Fun with Word Cubes

July 26th, 2010 | by emontero |

Programming Praxis (PP) recently published a puzzle that piqued my interest. Basically, the challenge is to code a solver for Word Cubes. Since I’ve always been a Word Cube aficionado, I decided to take a crack at the problem and see if I can come up with a decent solution.

Before we move forward, I must provide some context for the uninitiated. Here’s the problem statement from PP:


Word cube is game in which players form words from the nine letters in a cube. Words must have four or more letters and must use the central letter from the cube; at least one word will use all nine letters in the cube. The player who forms the most words wins. Many newspapers publish a word cube on their puzzle page, and Stealthcopter publishes a word cube on line daily. Wikipedia describes word cubes under the caption “word polygon.” There are twelve words formed from the word cube at right: bonnie, bunion, coin, concubine, conic, cubic, ennui, icon, nice, nine, nuncio, and union.

Your task is to write a program that finds all matching words for a given word cube. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Now that we know what needs to be done, let’s focus our attention on a possible solution. But first, some essential mathematical concepts are in order.

Combinatorics 101

Math -> Maht -> Mhta -> Mhat

What do all the preceding words have in common? They seem to be permutations of a finite set (i.e. the set comprised of the letters “M”, “a”, “t”, and “h”). A permutation is nothing more than a given arrangement of a set’s elements. The notion is important because before we can exhaustively find all the possible English words that can be formed with the letters of a Word Cube, we need to compute the permutations for each combination of 4-, 5-, 6-, 7-, 8- and 9-letter words.

If you still remember anything from your discrete mathematics courses, you may know the formula to compute the number of possible permutations for a given set is the factorial of its size. Namely, if I wanted to know the number of different permutations that can be formed with the letters of the word “Math”, I’d do this:

4 x 3 x 2 x 1 = 24

We can verify this fact by generating all the possible groupings:

Math
Maht
Mtah
Mtha
Mhat
Mhta
aMth
aMht
atMh
athM
ahMt
ahtM
tMah
tMha
taMh
tahM
thMa
thaM
hMat
hMta
haMt
hatM
htMa
htaM

As it can be easily seen, the total number of groupings is indeed 24. The iterative multiplication of all the integers from one to the set’s size, inclusive,  gives us the number of different permutations without repetition. Incidentally, that’s exactly what the factorial is. The factorial is algebraically expressed as n! (e.g. 3! = 6, 4! = 24 and 6! = 720). We’d read “3!” as “three factorial”.

A combination is a similarly simple concept to grok. If I asked you to compute all the possible combinations of the word “Math”, choosing 3 characters at a time, you’d show me this:

Mat
Mah
Mth
ath

On the other hand, if I asked the same question, but instead I’d said “choosing 4 characters at a time”, you’d have showed me this:

Math

Do you see a pattern at play here? Combinations are arrangements of a set’s elements without repetition. Those arrangements are formed by taking characters out of the original set in accordance with the desired size of the resulting groupings (e.g. “choosing 3 characters at a time”, “choosing 4 characters at a time”, et cetera). In order to solve a word cube, we need to create a set with all the combinations that we’re interested on, and subsequently permute those combinations in the hopes of finding legitimate English words.

NOTE: This is a superficial treatment of two profound and very interesting concepts. If you’d like to go deeper down the rabbit hole, ask almighty Google.

NOTE 2: If you’d like to see how to compute the permutations and combinations of a set in Java code, you can read Michael Gilleland’s articles here and here. As you shall notice upon reviewing my code, I’m using his solutions extensively.

Implementation Details

Now that we’ve outlined the “hardcore” math behind the problem, let’s get to it. These are the steps we need to take if we are to successfully solve a Word Cube:

  1. Find all the possible words that can be put together using the characters from the 3 x 3 grid.
  2. Select the words that have four or more characters and contain the letter from the center of the grid.
  3. Select the words that comply with the above criteria and can also be found in an English dictionary.
  4. Display those words.

You can read the source code in its entirety here. For step (1), we need to generate all the possible combinations of the requested size (i.e. 4, 5, 6, 7, 8 and 9) and then permute each one of them. (2) is telling us that all the legal words must have the character that’s in the center of the grid. That’s an optimization right there. We don’t have to permute a combination which doesn’t have that letter. The method generateWords contains the code that’s performing these two tasks.

For (3) and (4), we need to validate and display all the generated English words. *nix operating systems (i.e. Unix-like operating systems) come with a standard English dictionary located at /usr/share/dict. Both Ubuntu 10.14 and OS X 10.5 have it. If you’re running Ubuntu, you’ll have to modify the enumeration FILES in the code so that it has the correct file names. If you’re on OS X 10.5, you have nothing to worry about. As soon as we get a set with all the generated words, checking if a word belongs to the dictionary is trivial. This is exactly what the method check does:

1
2
3
4
5
6
7
    private static void check(Set<String> words, Set<String> dict) {
        for (String word : words) {
            if (dict.contains(word)) {
                System.out.println(word);
            }
        }
    }

Here’s how the solver behaved for some cubes I threw at it:

Word Cube Execution Time (seconds)
iereprtem 1.516556
iymardida 1.625368
rirsgnaaa 1.549554
soasrcsbr 1.657542
odcaaldec 1.749831

Remember that performance may vary depending on your computer’s processing power and workload during execution. Considering that Webster’s Second International Dictionary (the default English dictionary bundled with OS X 10.5) contains 234,936 words, less than two seconds per cube is nothing to be ashamed of.

Run it!

If you’re interested in giving the solver a try, make sure you download the code and run it like this from the command line:

java progpraxis.WordCubeSolver ncbcioune /usr/share/dict

TECHNICAL NOTE: JRE 1.6 comes with a predefined memory allocation of 64 MB of RAM. This basically means your Java application has 64 MB of RAM available to do its thing and no more. If you exceed this limit knowingly or otherwise, you’ll get a big, intimidating exception (OutOfMemoryError). I completely disregarded the space complexity of my implementation (English translation: I didn’t pay attention to memory usage). Consequently, it’s very likely you’ll have to tweak your VM to allow for the successful execution of the code. If you run into any memory hurdles, just run the solver like this:

java -Xmx128M progpraxis.WordCubeSolver ncbcioune /usr/share/dict

Notice how we’re now telling the VM that it should allocate 128 MB of RAM instead of the default value. That should give the solver more than enough memory to complete its job.

TECHNICAL NOTE 2: Yes, this is a brute-force algorithm in all its glorious splendor. Aren’t those the best? If you come up with a faster, more efficient solution, please, don’t keep it to yourself. Spread the love! :)

Post a Comment