Finding words within a character set

ruhk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Finding words within a character set by nightwatch (Scribe) on Jul 03, 2004 at 00:10 UTC
Check out the CPAN module String::Combination, which will do exactly what you want. `#!/usr/bin/perl use strict; use warnings; use String::Combination qw(combination); open DICT, 'english-words.txt' or die "Can't open dictionary: $!"; my @dict = <DICT>; chomp @dict; my %dict = map { (lc($_) => 1) } @dict; print "Letters? "; my $letters = lc <STDIN>; chomp $letters; my @results; for (my $i = 2; $i <= length $letters; $i++) { push @results, grep { $dict{$_} } combination $letters, $i; } print "@results"; __END__ Output: Letters? tar at art rat tar` [download]	[reply] [d/l]
Re: Finding words within a character set by pbeckingham (Parson) on Jul 02, 2004 at 20:52 UTC
Here is an algorithm - certainly not the best - that would work: If you could load your dictionary into memory: `my @words = <DICTIONARY>;` [download] Then it is a simple process of taking your input letters, say `art` and doing: `#! /usr/bin/perl -w use strict; my $input = 'art'; # Read words from a file instead. my @words = ('tar', 'rat', 'fake', 'at', 'unknown'); my $regex = qr{[$input]+}; my @matches = grep /^$regex$/, @words; print "$_\n" for @matches;` [download] Beware of: upper/lower case in input and in the dictionary, the massive amount of memory the words array will consume...	[reply] [d/l] [select]
Re: Finding words within a character set by Ven'Tatsu (Deacon) on Jul 02, 2004 at 21:49 UTC
If your given a letter can it be matched more than once if it's only given once, or can it only be matched as many times as it's given? ie a,e,h,r,t would match 'hater', but would it match 'hatter' or would it need to be a,e,h,r,t,t to match 'hatter'? If the first is true pbeckingham's code will work, if not this may do what your looking for. use strict; use warnings; my %words = map { $_ => countLetters($_)} qw/tar rat at attitude tap o +ther/; my $input = 'art'; my $input_letters = countLetters($input); my @matched_words; WORD: for my $word (sort keys %words) { for my $letter (keys %{$words{$word}}) { next WORD unless exists($input_letters->{$letter}) && $input_l +etters->{$letter} >= $words{$word}{$letter}; } push @matched_words, $word; } print "Matched words: @matched_words\n"; sub countLetters { my $word = shift; my %letters; $letters{$_}++ for split //, $word; return \%letters; } [download]	[reply] [d/l]
Re: Finding words within a character set by pelagic (Priest) on Jul 02, 2004 at 22:01 UTC
I pinched the following from tye's Permuting with duplicates and no memory sub nextPermute(\@) { my( $vals )= @_; my $last= $#{$vals}; return "" if $last < 1; # Find last item not in reverse-sorted order: my $i= $last-1; $i-- while 0 <= $i && $vals->[$i] ge $vals->[$i+1]; # If complete reverse sort, we are done! return "" if -1 == $i; # Re-sort the reversely-sorted tail of the list: @{$vals}[$i+1..$last]= reverse @{$vals}[$i+1..$last] if $vals->[$i+1] gt $vals->[$last]; # Find next item that will make us "greater": my $j= $i+1; $j++ while $vals->[$i] ge $vals->[$j]; # Swap: @{$vals}[$i,$j]= @{$vals}[$j,$i]; return 1; } @A= sort qw[a r t]; do { print @A, "\n"; } while( nextPermute(@A) ); __OUTPUT__ art atr rat rta tar tra [download] The output does not contain words like "at" but this might not be a problem if you do the pattern matching accordingly. pelagic	[reply] [d/l]
Re: Finding words within a character set by sleepingsquirrel (Chaplain) on Jul 02, 2004 at 23:58 UTC
You might also want to give Games::Scrabble a look. -- All code is 100% tested and functional unless otherwise noted.	[reply]
Re: Finding words within a character set by QM (Parson) on Jul 03, 2004 at 01:46 UTC
Another approach is to note that multiple words map onto one set of characters. Load your dictionary, sorting each word by character, and store it all in a hash, like this: `my %anagrams; while (defined( my $dict_word = <> ) ) { my $sorted = sort split //, $dict_word; push @{$anagrams{$sorted}} = $dict_word; }` [download] The keys are the first asciibetical permutation of each word set. Next, since you want words that are a subset of the given char set, you'll have to run through the combinations of the char set, and check if there's a key that matches, giving you an array of words for each match. I leave that as an exercise for OMAR. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re: Finding words within a character set by bageler (Hermit) on Jul 03, 2004 at 18:47 UTC
I was playing with the widgetwords konfabulator widget and wrote a script that calculates the top scoring words for each round. The way the game is played is you get a set of letters and have to come up with as many dictionary words as possible with those letters. Of course, to calculate the top scoring words requires a search like you are requesting. My main constraint is that the word has to be between 3 and 8 characters. #!/usr/bin/perl # authored by Josh Goldberg <josh@3io.com> # Jan 10, 2004 # # ARGV[0] is a string of letters, ARGV[1] is a dictionary file. # the script searches the dictionary for all combinations/permutations # of the ARGV[0] that have at least three characters and no character +is # repeated. # Originally authored for use with the widgetWords konfabulator widget +. use Algorithm::Permute; $\|=1; $resultstoprint = 7; $defaultDictionary = "joshWordList.txt"; @tilePoints = ( 100,300,300,200,100,400,200,400,100,800,500,100,300,10 +0,100,300,1000,100,100,100,100,400,400,800,400,1000 ); $vowels = ( 65,69,73,79,85,89 ); $common = ( 84,78,83,72,82,68,76 ); sub combinations { my @list= @_; my @pick= (0) x @list; return sub { my $i= 0; while( 1 < ++$pick[$i] ) { $pick[$i]= 0; return if $#pick < ++$i; } return @list[ grep $pick[$_], 0..$#pick ]; }; } $file = $ARGV[1] \|\| $defaultDictionary; die "missing dictionary" unless -e $file; @letters = split //,lc $ARGV[0]; # permute all combinations of 3-8 letters $combinations = combinations(@letters); while (@comb = $combinations->() ) { next unless scalar @comb > 2; $p = new Algorithm::Permute(\@comb); while (@res = $p->next) { local $"=''; $wordlist{"@res"} = 1; } } open LIST, "<$file"; while (<LIST>) { chomp; $dict{$_} = 1; } foreach $word (keys %wordlist) { if (exists $dict{$word}) { push @matches, $word; @res = split //,$word; $score = 0; for (@res) { $score += $tilePoints[ord(uc $_)-65]; } $len = scalar @res; $score += $len * 50; $score += 400 if $len == 8; $wordlist{$word} = $score; } } close LIST; print "top Eight Words:\n"; @sorted = reverse sort {$wordlist{$a} <=> $wordlist{$b} } @matches; for (@sorted) { if ($resultstoprint > 0) { last if $top++ > $resultstoprint; } @res = split //,$_; print "$_: $wordlist{$_} points\n"; } exit 0; [download]	[reply] [d/l]
Re: Finding words within a character set by davidj (Priest) on Jul 03, 2004 at 19:45 UTC
A while ago I wrote a little program that does a variety of things with anagrams. What follows is a stripped down version that simply prints all the anagrams of the indicated word. Note: if desired, size restrictions can be applied to the word in three ways: 1) a simple integer - will return anagrams of that size 2) an integer range (3-5) - will return anagrams whose length is within the range (in this case 3,4,5) 3) a mathematical expression (such as >4, <5, !=6) - the anagrams returned will be of size >4, size <5, and size !=6. #!/usr/local/bin/perl use strict vars; my ($w, $size, $entry, $eval, $cont, $line, $word); my (@combs, @word); my (%anagrams, %bysize, %combs); #read dictionary file my $file = shift; open(FILE, "$file") or die "cannot open file $file: $!\n"; print "reading dictionary...\n"; push( @{ $anagrams{join("", sort( split(//,$w) ) )} }, $w ) while( cho +mp($w = <FILE>) ); close(FILE); print "enter word and size(q to quit) "; chomp($cont = <STDIN>); while($cont !~ m/^q$/i) { %combs = @combs = %bysize = (); ($word, $size) = split(" ", $cont); if($size =~ m/^\d$/) { $eval = '$combs{$entry} = 1 if ($entry =~ m/^\w{' . $size . '} +$/ )'; } elsif($size =~ m/-/) { my ($fnum, $snum) = $size =~ m/(\d+)\s-\s(\d+)/; $eval = '$combs{$entry} = 1 if ($entry =~ m/^\w{' . $fnum . ', +' . $snum . '}$/)'; } elsif($size ne "") { $eval = '$combs{$entry} = 1 if length($entry)' . " $size"; } #find all combinations in word @word = split(//, $word); foreach (&combinations(@word)) { $entry = join("", sort(@$_)); if($size eq "") { $combs{$entry} = 1 if length($entry) > 0; } else { eval $eval; } } @combs = sort(keys %combs); # store each word in bysize hash arranged by size foreach (@combs) { push( @{ $bysize{ length($_) } }, @{ $anagrams{$_} } ) if @{ $ +anagrams{$_} }; } # print words &print_set(); print "enter word and size(q to quit) "; chomp($cont = <STDIN>); } sub print_set() { my ($key, $count, $total); my @words; foreach $key ( sort { $a <=> $b } keys %bysize ) { @words = sort( @{ $bysize{$key} } ); print "\nwords of size $key:\n"; $count = 0; foreach (@words) { $count++; print "$_ "; print "\n" if $count % 7 == 0; } $total += $count; print "\n"; } print "\ntotal words: $total\n"; print "\n\n"; } sub combinations() { return [] unless @_; my $first = shift; my @rest = combinations(@_); return @rest, map { [$first, @$_] } @rest; } exit; [download] sampe run using the scrabble enable.txt dictionary D:\PerlProjects\anagram>anagram.pl enable.txt reading dictionary... enter word and size(q to quit) perlmonks 3-5 words of size 3: elk elm els ems ens eon ern ers ken kep kop kor kos lek lop mel men mol mon mop mor mos nom nor nos oes oke ole oms one ons ope ops ore ors ose pen per pes pol pom pro rem rep res roe rom sel sen ser sol son sop words of size 4: elks elms enol eons epos erns eros kelp kemp keno kens keps kern knop koel kops kore kors leks leno lens lone lope lops lore lorn lose mels meno merk merl moke mole mols monk mons mope mops more morn mors mosk noel noes nome noms nope norm nose okes oles omen omer ones open opes ores orle pens peon perk perm peso poem poke pole pols pome poms pone pons pore pork porn pose prom pros rems repo reps roes role romp roms rope rose skep sloe slop soke sole some sone sore sorn words of size 5: enols enorm enrol kelps kemps kenos kerns knops knosp koels krone lemon lenos loner loper lopes lores loser melon merks merls meson mokes moles monks moper mopes morel mores morns morse nerol noels nomes norms omens omers opens orles pelon peons perks perms plonk poems poker pokes poler poles pomes pones pores porks porns poser proem prole proms prone prose repos roles romps ropes senor skelm skelp slope smerk smoke snore sorel sperm spoke spore total words: 223 enter word and size(q to quit) q [download] The other things the fuller program does is 1) allow for inclusion/exclusion of specific letters 2) allow for returning anagrams that contain a specific regex pattern If you are interested in seeing that, let me know and I will post the fuller version. davidj	[reply] [d/l] [select]
Re: Finding words within a character set by Anonymous Monk on Jul 04, 2004 at 03:58 UTC
I would use the dictionary as a database pattern match. If a user types a word starting with A,then B, then L, then E. Immediately in the program I would select words starting with A into a variable array then proceed matching the rest of the criteria within the array, once the user finish typing the rest of the word. I guess your dictionary would be well suit it that you will not have trouble matching words against it.	[reply]