in reply to Adventures in multilingualism

A few weeks ago, someone I know was trying to solve a somewhat interesting problem (interesting to me, at least). They wanted to know all the words that could be formed from all the combinations of the last 4 digits of a phone number.

When I first read this, I thought you meant actual words, in a dictionary, but it sounds like you mean combinations. Just an aside, if someone wants to limit this to actual words, it's pretty trivial with something like...

perl -lne 'if (/^[a-p,r-y]{4}$/i) {($a = lc $_) =~ tr/[a-p,r-y]/222333 +444555666777888999/; print "$_: $a"}' < /usr/share/dict/words

If the dictionary has more than one entry for the same word (i.e. capitalized and not), then you'll see it more than once, so putting things in a hash might help.

Replies are listed 'Best First'.
Re^2: Adventures in multilingualism
by revdiablo (Prior) on May 04, 2005 at 18:56 UTC
    When I first read this, I thought you meant actual words, in a dictionary, but it sounds like you mean combinations.

    Well, generating words was the original goal. Combinations are a step on the way to that goal. Once the combinations are generated, grepping for real words (e.g. using /usr/share/dict/words) is easy. But you have flipped the algorithm around, and used the words to generate the numbers, which is actually quite nice. I hadn't thought about it that way. Many thanks for the reply!

      This will work for well known words, the problem with this method is that there are many four letter combinations that are not necessarily English words but will be contractions of words or acronyms: MUFC => 'Manchester United Football Club', PLNE => 'plane'. There is also L33T which bends the rules by using numbers. Just call 0800 123-SK8R for your local half pipe...
        You are right, I mainly presented this as an easy hack of getting 90% there. Even with the examples you give, if I have my own dictionary, or generate a special one ahead of time, this procedure will work fairly well. For example I can generate abbreviations, without too much effort, from an existing dictionary which will cover your PLNE example. (L33T requires some work, but if you use it I pity you.) The dictionary your method uses is in a human's head. Sometimes these work well, sometimes they don't.

        Re: MUFC -- I'll probably get downvoted/flamed for this, but I'm a reds fan. To some that is truly a 4-letter word. :-)