Re: Simple regex wordlist question
by Zaxo (Archbishop) on Sep 11, 2007 at 05:24 UTC
|
That's an anagram problem, which has a canonical answer. First absorb your dictionary in a hash:
my %words;
{
open my $fh, '<', '/path/to/dict.txt' or die $!;
while <$fh> {
chomp;
my $key = join '', sort split '';
push @{$words{$key}}, $_;
}
}
That collects your data, now just rearrange your input word to make a key and print the associated words:
my $word = <>; # or however you get it
chomp;
$word = join '', sort split '', $word;
print "@{$words{$word}}\n";
Alphabetizing and mushing together the characters of a word provides a key for anagrams..
If this is used a lot or the dictionary is big, you may want to keep the "hash" in a database.
| [reply] [d/l] [select] |
Re: Simple regex wordlist question
by ikegami (Patriarch) on Sep 11, 2007 at 04:46 UTC
|
What you want is "The currect dict word contains the first letter, the currect dict word contains the second letter, ... and the currect dict word contains the last letter." Regexp are not good at doing "AND". It's usually better to do multiple matches, and AND the results of the matches.
Update: No, I suppose that's not quite right. You want "All letters of the currect dict word is in the specified word." A regexp character class can handle that quite well.
use strict;
use warnings;
my $input = <STDIN>;
my ($re) = map qr/^[$_]*$/,
join '',
map quotemeta,
$input =~ /(.)/g;
my $dict = 'wordlist.txt';
open(my $dict_fh, '<', $dict)
or die("Unable to open dictionary file: $!\n");
while (<$dict_fh>) {
print if /$re/;
}
Update: Note that the OP didn't mention each letter could only be used once, and he didn't mention that all the letters needed to be used, so I didn't add those restrictions.
| [reply] [d/l] |
|
|
I think he is working on the puzzle where you think of as many words that can be spelled using letters in the given word. But what about quantity?
I've already given ++ to Zaxo and CountZero for their anagram solutions, but I'm not sure now that's the point
An example would help, but if I'm right, he wants, given 'PerlMonks' a list like:
-
Perl (of course)
-
role
-
Poke
-
sole
-
solemn
But would not like 'reel', as there are not two 'e's in the input.
Is this what johngg has?
perhaps vcTheGuru is closer to right with Math::Combinatorics, except maybe using the combine function looping from 1..($#input+1).
| [reply] |
|
|
The following needs a great deal of polish. I took ikegami's basic regexp as a superset of the words you're looking for. Then I horribly misused it without all the safety that ikegami was right to include (qr// syntax, etc.).
I also ended up turning your semantic inside out in that the dictionary file is now on the command line and the starting word is hard coded; sorry about that.
The guts of this solution involve counting letters in a regexp:
| [reply] [d/l] [select] |
Re: Simple regex wordlist question
by CountZero (Bishop) on Sep 11, 2007 at 05:38 UTC
|
Sometimes you have to take a step back to see the big picture.My solution would be to pre-process the dictionary list and make it into a "key = value" list. The key is the sorted order of the value: so "Apache" becomes: "aacehp = Apache". Splitting and sorting the words in your dictionary file is as easy as sort split '', lc $word where $word contains a word in your file. The problem then becomes as easy as splitting and sorting your input in the same way and checking your pre-processed list for a match: if ($guess eq $try_this){...}. If the list itself is sorted a binary search would be blindingly fast even on huge dictionaries. Granted the pre-processing takes some time but it is a one time investment only and can of course be saved on persistent storage for a next time use. Update: Zaxo beat me to the punch. I blame our new kitten Hobbes which badly needed some petting.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] [select] |
Re: Simple regex wordlist question
by atemon (Chaplain) on Sep 11, 2007 at 05:04 UTC
|
Your code is checking each line in the file with the input string. For that another way to do is, get all permutations from the input string, then compare each line in the file with all the permutations you got.
To get all permutations, you can use Math::Combinatorics. Get all permutations into an array and you can use grep to compare.
I have updated your code and is working fine for me. The code I updated is
#!/usr/bin/perl
use strict;
use warnings;
use Math::Combinatorics; #to get all combinations
my $input = <STDIN>;
my @inputarray = split('',$input);
my $file = 'wordlist.txt';
open(INFO, $file);
my @lines = <INFO>;
close(INFO);
my @permutations= map { join "", @$_ } permute(@inputarray );# Get all
+ permutations of input string
my %dup;
@dup{@permutations}= ();
my @tacobell= grep { exists $dup{$_} } @lines; # Check if any of the c
+ombinations exists in file
print join ("\n",@tacobell);
exit();
Update: Fixed typo.
Cheers !
--VC
There are three sides to any argument..... your side, my side and the right side.
| [reply] [d/l] |
|
|
| [reply] |
Re: Simple regex wordlist question
by johngg (Canon) on Sep 11, 2007 at 11:01 UTC
|
You could do it with regular expression look-aheads. You have to make sure that you cater for input that has duplicate letters, e.g. butter or banana. You can mandate the length of result words in the regex without having to cut down your dictionary file.
use strict;
use warnings;
my $dictFile = q{2of12.txt};
open my $dictFH, q{<}, $dictFile
or die qq{open: $dictFile: $!\n};
chomp(my @words = <$dictFH>);
close $dictFH
or die qq{close: $dictFile: $!\n};
my $input = shift;
my $charCt = length $input;
my %chars;
$chars{$_} ++ for split m{}, $input;
my $charsWanted = join q{},
map { qq{(?=@{ [ qq{.*$_} x $chars{$_} ] })} } keys %chars;
my $wordLength = qq{(?=.{@{ [ length $input ] }}\$)};
my $rxDict = qr{(?x)^$charsWanted$wordLength};
print
map { qq{$_\n} }
grep { m{$rxDict} }
@words;
This seems to work quite fast, the reading of the dictionary file taking most of the time.
Cheers, JohnGG | [reply] [d/l] |
Re: Simple regex wordlist question
by escherist (Initiate) on Sep 11, 2007 at 04:48 UTC
|
Anything more elegant than:
foreach $tacobell (@lines) {
if ($tacobell =~ /$inputarray[0]/) {
if ($tacobell =~ /$inputarray[1]/) {
if ($tacobell =~ /$inputarray[2]/) {
if ($tacobell =~ /$inputarray[3]/) {
if ($tacobell =~ /$inputarray[4]/) {
if ($tacobell =~ /$inputarray[5]/) {
print $tacobell;
}
}
}
}
}
}
}
I know I'm no programming expert :/ | [reply] [d/l] |
Re: Simple regex wordlist question
by klekker (Pilgrim) on Sep 11, 2007 at 06:27 UTC
|
Since you have alread build an array, I'd give grep a try:
my @matching_words = grep(/$your_regular_expression/, @lines);
and $your_regular_expression is something like /[anshef]{6}/i (where 'anshef' is an example for your six valid characters for the word).
k | [reply] [d/l] [select] |
Re: Simple regex wordlist question
by hangon (Deacon) on Sep 11, 2007 at 06:44 UTC
|
Here's how I would do it, with just a modification of your regex.
use strict;
use warnings;
my $input = <STDIN>;
chomp $input;
my $file = 'wordlist.txt';
open(INFO, $file);
my @lines = <INFO>;
close(INFO);
for my $word (@lines){
if ($word =~ /^[$input]+$/){
print "$word\n";
}
}
Udate: I may have misinterpreted the OP. This obviously will not work if all six input characters must be used.
| [reply] [d/l] |
Re: Simple regex wordlist question
by Anonymous Monk on Sep 11, 2007 at 18:44 UTC
|
Thank you guys for all of the help in getting this fixed up. I started playing with the Math::Combinatorics method before I saw any of the other posts then my gf made me go to bed so I haven't had a chance to play with them too much :/
This is what I have right now though. Something still isn't quite right with the matching from the dict file (I changed it to match 3-6 letter words from a full dictionary instead fyi). The @permutationsX arrays do in fact hold the different permutations of 6choose(3-6) but there's some problem matching the words to those in the dictionary file.
#!/perl
use strict;
use warnings;
use Math::Combinatorics;
my $input = <STDIN>;
chomp($input);
my @inputarray = split('',$input);
my $file = 'wordlist.txt';
open(INFO, $file);
my @lines = <INFO>;
close(INFO);
chomp(@lines);
print "\n\n\n\nGenerating Wordlist from \'$input\'...\n";
print "----------\n";
#find all 3 letter words and print out
my @permutations3 = map { join "", @$_ } combine(3,@inputarray);
my %dup3;
@dup3{@permutations3} = ();
my @tacobell3 = grep { exists $dup3{$_} } @lines;
foreach (@tacobell3) {
print "$_\n";
}
#find all 4 letter words and print out
my @permutations4 = map { join "", @$_ } combine(4,@inputarray);
my %dup4;
@dup4{@permutations4} = ();
my @tacobell4 = grep { exists $dup4{$_} } @lines;
foreach (@tacobell4) {
print "$_\n";
}
#find all 5 letter words and print out
my @permutations5 = map { join "", @$_ } combine(5,@inputarray);
my %dup5;
@dup5{@permutations5} = ();
my @tacobell5 = grep { exists $dup5{$_} } @lines;
foreach (@tacobell5) {
print "$_\n";
}
#find all 6 letter words and print out
my @permutations6 = map { join "", @$_ } permute(@inputarray);
my %dup6;
@dup6{@permutations6} = ();
my @tacobell6 = grep { exists $dup6{$_} } @lines;
foreach (@tacobell6) {
print "$_\n";
}
print "----------\n\n\n";
$_ = <STDIN>;
exit();
The point Goib brought up is correct. Given the letters tuegps, I would want to output 'getups' 'getup' 'guest' 'gust' 'tug' etc. The code I have is SO CLOSE! It's just having an issue matching the words :/
Thanks again for all the help!
-Alan | [reply] [d/l] |