escherist has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I just have a quick question and I can't believe I can't figure it out on my own. I'm working on a simple word game, and what I need perl to do is open up a dictionary file (I've already limited the dictionary wordlist to 6 letter words) and then based on a given 6 character input, output all words from the dictionary list that can be made from those letters, of course in any order.

What I have right now just for testing the code snippet...
use strict; use warnings; my $input = <STDIN>; my @inputarray = split('',$input); my $file = 'wordlist.txt'; open(INFO, $file); my @lines = <INFO>; close(INFO); my $tacobell; foreach $tacobell (@lines) { if ($tacobell =~ /$inputarray[0]$inputarray[1]$inputarray[2]$input +array[3]$inputarray[4]$inputarray[5]/) { print $tacobell; } } exit();
After having read through about 15 regex tutorials, I'm sure there's an easy way to do this, I just can't figure it out. I think my code looks for the sequence of characters in that order ... I just need it from any order.

Also, I know there are some different ways of formatting regex to decrease cpu time where you don't have to recalculate certain things, so hopefully if there's a solution to this problem it'll be somewhat cpu friendly :)

I'd appreciate any help yall can offer!

Thanks,
Alan

Replies are listed 'Best First'.
Re: Simple regex wordlist question
by Zaxo (Archbishop) on Sep 11, 2007 at 05:24 UTC

    That's an anagram problem, which has a canonical answer. First absorb your dictionary in a hash:

    my %words; { open my $fh, '<', '/path/to/dict.txt' or die $!; while <$fh> { chomp; my $key = join '', sort split ''; push @{$words{$key}}, $_; } }

    That collects your data, now just rearrange your input word to make a key and print the associated words:

    my $word = <>; # or however you get it chomp; $word = join '', sort split '', $word; print "@{$words{$word}}\n";
    Alphabetizing and mushing together the characters of a word provides a key for anagrams..

    If this is used a lot or the dictionary is big, you may want to keep the "hash" in a database.

    After Compline,
    Zaxo

Re: Simple regex wordlist question
by ikegami (Patriarch) on Sep 11, 2007 at 04:46 UTC

    What you want is "The currect dict word contains the first letter, the currect dict word contains the second letter, ... and the currect dict word contains the last letter." Regexp are not good at doing "AND". It's usually better to do multiple matches, and AND the results of the matches.

    Update: No, I suppose that's not quite right. You want "All letters of the currect dict word is in the specified word." A regexp character class can handle that quite well.

    use strict; use warnings; my $input = <STDIN>; my ($re) = map qr/^[$_]*$/, join '', map quotemeta, $input =~ /(.)/g; my $dict = 'wordlist.txt'; open(my $dict_fh, '<', $dict) or die("Unable to open dictionary file: $!\n"); while (<$dict_fh>) { print if /$re/; }

    Update: Note that the OP didn't mention each letter could only be used once, and he didn't mention that all the letters needed to be used, so I didn't add those restrictions.

      I think he is working on the puzzle where you think of as many words that can be spelled using letters in the given word. But what about quantity?

      I've already given ++ to Zaxo and CountZero for their anagram solutions, but I'm not sure now that's the point

      An example would help, but if I'm right, he wants, given 'PerlMonks' a list like:

      • Perl (of course)
      • role
      • Poke
      • sole
      • solemn
      But would not like 'reel', as there are not two 'e's in the input.

      Is this what johngg has?

      perhaps vcTheGuru is closer to right with Math::Combinatorics, except maybe using the combine function looping from 1..($#input+1).


      I humbly seek wisdom.

        The following needs a great deal of polish. I took ikegami's basic regexp as a superset of the words you're looking for. Then I horribly misused it without all the safety that ikegami was right to include (qr// syntax, etc.).

        I also ended up turning your semantic inside out in that the dictionary file is now on the command line and the starting word is hard coded; sorry about that.

        The guts of this solution involve counting letters in a regexp:


        I humbly seek wisdom.
Re: Simple regex wordlist question
by CountZero (Bishop) on Sep 11, 2007 at 05:38 UTC
    Sometimes you have to take a step back to see the big picture.

    My solution would be to pre-process the dictionary list and make it into a "key = value" list. The key is the sorted order of the value: so "Apache" becomes: "aacehp = Apache". Splitting and sorting the words in your dictionary file is as easy as sort split '', lc $word where $word contains a word in your file.

    The problem then becomes as easy as splitting and sorting your input in the same way and checking your pre-processed list for a match: if ($guess eq $try_this){...}.

    If the list itself is sorted a binary search would be blindingly fast even on huge dictionaries.

    Granted the pre-processing takes some time but it is a one time investment only and can of course be saved on persistent storage for a next time use.

    Update: Zaxo beat me to the punch. I blame our new kitten Hobbes which badly needed some petting.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Simple regex wordlist question
by atemon (Chaplain) on Sep 11, 2007 at 05:04 UTC

    Your code is checking each line in the file with the input string. For that another way to do is, get all permutations from the input string, then compare each line in the file with all the permutations you got.

    To get all permutations, you can use Math::Combinatorics. Get all permutations into an array and you can use grep to compare.

    I have updated your code and is working fine for me. The code I updated is

    #!/usr/bin/perl use strict; use warnings; use Math::Combinatorics; #to get all combinations my $input = <STDIN>; my @inputarray = split('',$input); my $file = 'wordlist.txt'; open(INFO, $file); my @lines = <INFO>; close(INFO); my @permutations= map { join "", @$_ } permute(@inputarray );# Get all + permutations of input string my %dup; @dup{@permutations}= (); my @tacobell= grep { exists $dup{$_} } @lines; # Check if any of the c +ombinations exists in file print join ("\n",@tacobell); exit();

    Update: Fixed typo.

    Cheers !

    --VC



    There are three sides to any argument.....
    your side, my side and the right side.

      This could get awfully slow. 6! is only 720, but if he went to 7! that would be 5040.

      The OP's solution is faster, and ikegami's solution is probably even more so.

Re: Simple regex wordlist question
by johngg (Canon) on Sep 11, 2007 at 11:01 UTC
    You could do it with regular expression look-aheads. You have to make sure that you cater for input that has duplicate letters, e.g. butter or banana. You can mandate the length of result words in the regex without having to cut down your dictionary file.

    use strict; use warnings; my $dictFile = q{2of12.txt}; open my $dictFH, q{<}, $dictFile or die qq{open: $dictFile: $!\n}; chomp(my @words = <$dictFH>); close $dictFH or die qq{close: $dictFile: $!\n}; my $input = shift; my $charCt = length $input; my %chars; $chars{$_} ++ for split m{}, $input; my $charsWanted = join q{}, map { qq{(?=@{ [ qq{.*$_} x $chars{$_} ] })} } keys %chars; my $wordLength = qq{(?=.{@{ [ length $input ] }}\$)}; my $rxDict = qr{(?x)^$charsWanted$wordLength}; print map { qq{$_\n} } grep { m{$rxDict} } @words;

    This seems to work quite fast, the reading of the dictionary file taking most of the time.

    Cheers,

    JohnGG

Re: Simple regex wordlist question
by escherist (Initiate) on Sep 11, 2007 at 04:48 UTC
    Anything more elegant than:
    foreach $tacobell (@lines) { if ($tacobell =~ /$inputarray[0]/) { if ($tacobell =~ /$inputarray[1]/) { if ($tacobell =~ /$inputarray[2]/) { if ($tacobell =~ /$inputarray[3]/) { if ($tacobell =~ /$inputarray[4]/) { if ($tacobell =~ /$inputarray[5]/) { print $tacobell; } } } } } } }
    I know I'm no programming expert :/
Re: Simple regex wordlist question
by klekker (Pilgrim) on Sep 11, 2007 at 06:27 UTC
    Since you have alread build an array, I'd give grep a try:
    my @matching_words = grep(/$your_regular_expression/, @lines);
    and $your_regular_expression is something like /[anshef]{6}/i (where 'anshef' is an example for your six valid characters for the word).

    k
Re: Simple regex wordlist question
by hangon (Deacon) on Sep 11, 2007 at 06:44 UTC

    Here's how I would do it, with just a modification of your regex.

    use strict; use warnings; my $input = <STDIN>; chomp $input; my $file = 'wordlist.txt'; open(INFO, $file); my @lines = <INFO>; close(INFO); for my $word (@lines){ if ($word =~ /^[$input]+$/){ print "$word\n"; } }

    Udate: I may have misinterpreted the OP. This obviously will not work if all six input characters must be used.

Re: Simple regex wordlist question
by Anonymous Monk on Sep 11, 2007 at 18:44 UTC
    Thank you guys for all of the help in getting this fixed up. I started playing with the Math::Combinatorics method before I saw any of the other posts then my gf made me go to bed so I haven't had a chance to play with them too much :/

    This is what I have right now though. Something still isn't quite right with the matching from the dict file (I changed it to match 3-6 letter words from a full dictionary instead fyi). The @permutationsX arrays do in fact hold the different permutations of 6choose(3-6) but there's some problem matching the words to those in the dictionary file.

    #!/perl use strict; use warnings; use Math::Combinatorics; my $input = <STDIN>; chomp($input); my @inputarray = split('',$input); my $file = 'wordlist.txt'; open(INFO, $file); my @lines = <INFO>; close(INFO); chomp(@lines); print "\n\n\n\nGenerating Wordlist from \'$input\'...\n"; print "----------\n"; #find all 3 letter words and print out my @permutations3 = map { join "", @$_ } combine(3,@inputarray); my %dup3; @dup3{@permutations3} = (); my @tacobell3 = grep { exists $dup3{$_} } @lines; foreach (@tacobell3) { print "$_\n"; } #find all 4 letter words and print out my @permutations4 = map { join "", @$_ } combine(4,@inputarray); my %dup4; @dup4{@permutations4} = (); my @tacobell4 = grep { exists $dup4{$_} } @lines; foreach (@tacobell4) { print "$_\n"; } #find all 5 letter words and print out my @permutations5 = map { join "", @$_ } combine(5,@inputarray); my %dup5; @dup5{@permutations5} = (); my @tacobell5 = grep { exists $dup5{$_} } @lines; foreach (@tacobell5) { print "$_\n"; } #find all 6 letter words and print out my @permutations6 = map { join "", @$_ } permute(@inputarray); my %dup6; @dup6{@permutations6} = (); my @tacobell6 = grep { exists $dup6{$_} } @lines; foreach (@tacobell6) { print "$_\n"; } print "----------\n\n\n"; $_ = <STDIN>; exit();
    The point Goib brought up is correct. Given the letters tuegps, I would want to output 'getups' 'getup' 'guest' 'gust' 'tug' etc. The code I have is SO CLOSE! It's just having an issue matching the words :/

    Thanks again for all the help!

    -Alan