in reply to literati cheat / finding words from scrambled letters
Using this, I ran through a 200,000-word file in around a second, using almost no memory in the process. And I'm sure someone imaginative will be able to significantly improve on the efficiency of the function and file reads.use Benchmark; $letters = 'chunzcii'; $lhash{$_}++ for split //, $letters; $lcount = length($letters); $t0 = new Benchmark; open ($handle, 'dictionary.dat'); while (<$handle>) { chomp; $w++ if scrabble($_); $c++; } close ($handle); $t1 = new Benchmark; $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n"; print "$w matches from $c words"; sub scrabble { return 0 if length($_[0]) > $lcount; my %wlhash; $wlhash{$_}++ for split //, $_[0]; for (keys %wlhash) { return 0 if $lhash{$_} < $wlhash{$_} && ($nf += $wlhash{$_} - $lhash{$_}) > $blanks; } return 1; }
EDIT: Actually, Benchmark says I used 3.93 seconds (124 matches out of 201252 words).Then I tried a different algorithm, which took 2.32 seconds:
Incidently, loading all 200,000+ words into an array and then cycling through them increased the time to 3.08 seconds, so not only is this more wasteful of memory, but it seems to be less efficient as well.use Benchmark; $letters = 'chunzcii'; $sorted = join '', sort split //, $letters; $lcount = length($letters); $t0 = new Benchmark; open ($handle, 'dictionary.dat'); while (<$handle>) { chomp; $w++ if scrabble($_); $c++; } close ($handle); $t1 = new Benchmark; $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n"; print "$w matches from $c words"; sub scrabble { return 0 if length($_[0]) > $lcount; $p = 0; for (sort split //, $_[0]) { return 0 if !($p = index($sorted, $_, $p) + 1); } return 1; }
EDIT: Oh wait, you need support for blanks. Guess the array method is out, since hashing is much more suited for this. 4.27 seconds for 1240 matches from 201252 words, with one blank:
And 4.53 seconds for 10230 matches from 201252 words, with two blanks.use Benchmark; $letters = 'chuncii_'; $lcount = length($letters); while ($letters =~ /[^a-z]/) { $letters =~ s/[^a-z]//; $blanks++; } $lhash{$_}++ for split //, $letters; $t0 = new Benchmark; open ($handle, 'dictionary.dat'); while (<$handle>) { chomp; $w++ if scrabble($_); $c++; } close ($handle); $t1 = new Benchmark; $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n"; print "$w matches from $c words"; sub scrabble { return 0 if length($_[0]) > $lcount; my %wlhash; $wlhash{$_}++ for split //, $_[0]; $nf = 0; for (keys %wlhash) { return 0 if $lhash{$_} < $wlhash{$_} && ($nf += $wlhash{$_} - $lhash{$_}) > $blanks; } return 1; }
And 4.98 seconds for 65472 matches from 201252 words, with four blanks.
And 5.07 seconds for 135718 matches from 201252 words, with all eight letters blanks.
EDIT: And sulfericacid couldn't figure out how to make it print the matches, so here's yet one more version. Note that this version requires storing all matches in memory at once, since I'm having them sorted by length and then alphabetically. Now I should probably contact Yahoo and tell them you're cheating...
Returned (from a small dictionary sub-set):$letters = 'chuncii_'; $lcount = length($letters); while ($letters =~ /[^a-z]/) { $letters =~ s/[^a-z]//; $blanks++; } $lhash{$_}++ for split //, $letters; open ($handle, 'dictionary2.dat'); while (<$handle>) { chomp; push @matches, $_ if scrabble($_); } close ($handle); print join "\n", sort { length($b) <=> length($a) || $a cmp $b } @matc +hes; sub scrabble { return 0 if length($_[0]) > $lcount; my %wlhash; $wlhash{$_}++ for split //, $_[0]; $nf = 0; for (keys %wlhash) { return 0 if $lhash{$_} < $wlhash{$_} && ($nf += $wlhash{$_} - $lhash{$_}) > $blanks; } return 1; }
EDIT: I also have a use strict / warnings version up on my scratchpad now. Though I can't guarantee it will be there forever.zucchini acacia couch lunch kick zinc cut did hen hit ice kin nip sun ugh pi a
|
|---|