vnpenguin has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, From a set of letters, for example "OLHOSC", how to found any meaning word by using all letter? "SCHOOL" is one of these word in this case. I would like to do it with Perl but I have no idea about the algorithme for this problem. Any suggestion ? Thank you in advance, Regards,

Replies are listed 'Best First'.
Re: Found a word from a set of letters
by Limbic~Region (Chancellor) on Jan 29, 2008 at 18:41 UTC
    vnpenguin,
    Assumption: You have a dictionary file that has every word possible on a single line:
    #!/usr/bin/perl use strict; use warnings; my $dictionary = 'words.txt'; open(my $fh, '<', $dictionary) or die "Unable to open '$dictionary' fo +r reading: $!"; my %data; while (<$fh>) { chomp $_; my $str = join '', sort map lc(), split //; push @{$data{$str}}, $str; } my $letters = $ARGV[0] or die "Usage: $0 <set_of_letters>"; $letters = join '', sort map lc(), split //, $letters; if (! exists $data{$letters}) { print "Could not find any matching words\n"; } else { for my $word (@{$data{$letters}}) { print "$word\n"; } }

    Update: This code is untested. It stores everything in memory, but as kyle shows - that can easily be modified if needed. The algorithm essentially changes all words in the dictionary to the same format so you can perform a hash lookup. For instance:

    art = art tar = art rat = art

    Cheers - L~R

      Thank you !
Re: Found a word from a set of letters
by kyle (Abbot) on Jan 29, 2008 at 18:44 UTC

    This solution also depends on a dictionary, but it doesn't read the whole thing into memory. It tests each line of it as it reads through.

    my $dictionary = '/usr/share/dict/words'; # input letters, changed to lowercase my @letters = split //, lc shift @ARGV; open my $dict_fh, '<', $dictionary or die "Can't read dictionary '$dictionary': $!"; WORD: while ( my $word = lc <$dict_fh> ) { chomp $word; # $word is lowercase with no newline my $orig_word = $word; # for each letter in the search set, for my $letter ( @letters ) { # remove that letter from the dictionary word # and skip to the next word if it's not found next WORD if ! ($word =~ s/$letter//); } # if all the dictionary word's letters were used, # print out the original dictionary word. print "$orig_word\n" if $word eq ''; } close $dict_fh or die "Can't close: $!";

    Update: Added some comments to describe the algorithm (thanks to Limbic~Region for the suggestion). Also, this code is tested (with the case the OP specified).

      Thank you so much, kyle ! Your code works like a charm !!! Regards,
Re: Found a word from a set of letters
by ikegami (Patriarch) on Jan 29, 2008 at 18:51 UTC

    Sort the letters, sort the letters of the words of a dictionary file, and match them up.

    sub canonize { return join '', sort split //, lc $_[0]; } my %dict; { open(my $fh, '<', $dict_file) or die("Unable to open dictionary file \"$dict_file\": $!\n"); while (<$fh>) { chomp; push @{ $dict{canonize($_)} }, $_; } } my $jumble = 'OLHOSC'; $jumble = canonize($jumble); if (exists($dict{$jumble})) { print("$_\n") for @{ $dict{$jumble} }; } else { print("No matches found.\n"); }

    You might want to look at using a Trie for a more memory efficient structure than a hash for this purpose.

    If you only want one match, and you're only working on one word, you can just process the dictionary file until you find a match.

    sub canonize { return join '', sort split //, lc $_[0]; } my $jumble = 'OLHOSC'; open(my $fh, '<', $dict_file) or die("Unable to open dictionary file \"$dict_file\": $!\n"); my $found = 0; $jumble = canonize($jumble); while (<$fh>) { chomp; if (canonize($_) eq $jumble) { print("$_\n"); $found = 1; last; } } if (!$found) { print("No matches found.\n"); }

    Update: The others inlined canonize. That's probably a good idea since function calls are expensive and you'll be canonizing a lot of words.

Re: Found a word from a set of letters
by mwah (Hermit) on Jan 29, 2008 at 19:11 UTC

    There have been already many solutions w/word list look up posted to this thread. I'll try to give a 'hands on' example (if you work on a windows box).

    First, you could download a dictionary from

    ftp://ftp.ox.ac.uk/pub/wordlists/american/dic-0294.tar.gz

    and put it in a directory, eg. "c:\dic-0294" (extract it *there*, winrar or winzip should work)
    You'll see ~30 different files, each containing words of a defined length.

    Then, create a Perl script like the ones that have been posted in the thread or use one that is tailored for this dictionary, like:

    use strict; use warnings; my $word = shift || 'OLHOSC'; my $fname = sprintf 'length%02d.txt', length $word; # eg. length06.tx +t my $word_key = join '', sort split //, lc $word; # sort lowercase +letters open my $fh, '<', $fname or die "sorry, no such word list available, +Bye!"; while( my $line = <$fh> ) { chomp $line; my $dict_key = join '', sort split //, lc $line; print "$word => $line\n" if $dict_key eq $word_key # bail out here - if only one match is needed } close $fh;

    From the *length* of the looked-up word, the name of the dictionary file is built. If you extracted the above archive, then the files with word lengths between 2 and 32 should reside in the current directory.

    Regards

    mwa

      Thank you for your suggestion, Regards,
Re: Found a word from a set of letters
by hipowls (Curate) on Jan 29, 2008 at 20:40 UTC

    If you are going to be frequently searching for anagrams then it is worth while splitting it into two scripts, the first reads the dictionary, generates the look up table and then dumps the table out to disk as a perl data structure. The second script reads in the data structure and looks up the word. The advantage is that you only generate the lookup table when the dictionary is updated.

    .

    Script to create the look up table.

    use strict; use warnings; use YAML::Syck qw(DumpFile); my $dict_file = 'EN_AU.dic'; open my $dict, '<', $dict_file or die "Unable to open '$dict_file' for + reading: $!"; my %word_table; while (<$dict>) { chomp $word; my $key = join '', sort map lc(), split //, $word; push @{$word_table{$key}}, $word; } DumpFile( 'dict.yml', \%word_table );
    and to find the anagrams
    use strict; use warnings; use YAML::Syck qw(LoadFile; # slurp all arguments my @letters = split //, @ARGV; die usage unless @letters; my $key = join '', sort @letters; my $lookup_word = LoadFile('dict.yml'); if ( exists $lookup_word->{$key} ) { my @words = sort @{ $lookup_words{$key} }; print "Found words: @words\n; } else { print "No words found\n"; } sub usage { ... }