it came out approximately 20 ms/word ... I seriously doubt that any DBM solution will beat that!
Well that sounds like a challenge :-)
I started with a dictionary file containing about 100,000 words; created a GDBM file keyed on first 3 letters of words; and checked each word from a document containing 10479 words.
Program to create index...
use strict; use GDBM_File ; my %hash; tie %hash, 'GDBM_File', 'words.dbm', &GDBM_WRCREAT, 0640; my $dict_file = '/usr/share/dict/longlist'; open DICT, "<$dict_file" or die "open($dict_file): $!"; my @words = (); my $prefix = ''; while(<DICT>) { chomp; $_ = lc($_); if(substr($_, 0, 3) ne $prefix) { if(@words) { $hash{$prefix} = join(' ', @words); @words = (); } $prefix = substr($_, 0, 3); } push @words, $_; } untie %hash ;
If I tie the GDBM file once and check all the words the script runs in under a second which is (a) well under one millisecond per word and (b) probably not how you'd do it unless you had a dedicated wordcheck daemon
If I tie the GDBM file once before checking each word (and then untie it afterwards) then checking all the words takes 50 seconds. Which is less than 5 milliseconds per word.
use strict; use Benchmark; use GDBM_File ; my $dbm_file = 'words.dbm'; my %hash; $/ = undef; my $data = <DATA>; my @words = map {lc($_)} ($data =~ /(\w+)/g); print scalar(@words), "\n"; timethis(1, 'one_tie()'); timethis(1, 'many_ties()'); sub one_tie { my($found, $not_found) = (0, 0); tie %hash, 'GDBM_File', $dbm_file, &GDBM_WRCREAT, 0640; foreach (@words) { my $prefix = substr($_, 0, 3); if(exists($hash{$prefix}) and $hash{$prefix} =~ /\b$_\b/) { $found++; } else { $not_found++; } } untie %hash ; print "Found: $found\nNot found: $not_found\n"; } sub many_ties { my($found, $not_found) = (0, 0); foreach (@words) { tie %hash, 'GDBM_File', $dbm_file, &GDBM_WRCREAT, 0640; my $prefix = substr($_, 0, 3); if(exists($hash{$prefix}) and $hash{$prefix} =~ /\b$_\b/) { $found++; } else { $not_found++; } untie %hash ; } print "Found: $found\nNot found: $not_found\n"; } __DATA__ insert output of 'perldoc perltoot' here :-)
Update: Of course you didn't mention what hardware you're running on - mines a 800Mhz PIII
Update 2: Added code for generating index in <readmore> tags.
In reply to Re: Re: Fast wordlist lookup for game
by grantm
in thread Fast wordlist lookup for game
by jerrygarciuh
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |