in reply to Re: Re: Fast wordlist lookup for game
in thread Fast wordlist lookup for game
As I said I would last night, I went back and adapted my to original programs to use 3-character indexing instead of two, to bring it into line with yours and benefit from the smaller linear search.
The result, using my original 26 "worst case" words, is a reduction in average search time by nearly an order of magnitude from 20ms/word to 2.7ms/word, which is kind of pleasing.
If you can make your 10k test file available to me, or we can agree on a suitable publically available test file, we could try comparing apples with apples:^).
I do love it when someone makes me work for my one of my boasts!
Indexer code
#! perl -sw use strict; my ($oldfirst, $oldc2n3) = ( '', '' ); open WORDS, '<', $ARGV[0] or die $ARGV[0] . $1 . $/; while(<WORDS>) { chomp; $_ = lc; my ($first, $c2n3) = /^(.)(..?)?.*/; $first = './' . $first; $c2n3 = substr($c2n3.'__',0,2) if !defined $c2n3 or length($c2n3) +< 2; $c2n3 = $first . '/' . $c2n3; if ( ($first ne $oldfirst) or ($c2n3 ne $oldc2n3) ) { mkdir $first if not -e $first; open SUB, '>>', $c2n3 or die $c2n3 . $! . $/; } print SUB $_, $/; ($oldfirst, $oldc2n3) = ($first, $c2n3); print "\r$c2n3:$_ "; }
Wordcheck test program
#! perl -sw use strict; use Benchmark::Timer; sub IsWord { return if not @_ or length == 1; my ($word, $first, $c2n3) = $_[0] =~ /^((.)(..?)?.*)$/; local $_ = do{ local (@ARGV,$/) = "./$first/$c2n3"; <> }; return /^$word$/im; } my $timer = Benchmark::Timer->new(skip=>0); for (@ARGV) { $timer->start; my $IsWord = IsWord( $_ ); $timer->stop; print $_, $IsWord ? ' is ' : ' is not ', 'a word', $/; } $timer->report; print "@{[$timer->data('_default')]}", $/; __END__
Results and 'benchmark'.
C:\test\WORDS>../208899-2 Anonymous Baltic Confirm Delta Entity Font G +regarious Hazzard Indigo Jalopy Kilo Lascivious Matter Notorious Ovul +ation Prosthetic Quinine Reading Semaphore Transendental Uniform Vivisection Weather Xenophobia Yesterday Zorro Anonymous is a word Baltic is a word Confirm is a word Delta is a word Entity is a word Font is a word Gregarious is a word Hazzard is a word Indigo is a word Jalopy is a word Kilo is a word Lascivious is a word Matter is a word Notorious is a word Ovulation is a word Prosthetic is a word Quinine is a word Reading is a word Semaphore is a word Transendental is not a word Uniform is a word Vivisection is a word Weather is a word Xenophobia is a word Yesterday is a word Zorro is a word 26 trials of _default (70ms total), 2.692ms/trial 0 0 0 0 0 0.01 0 0 0.01 0 0 0 0 0 0 0.01 0.01 0 0.01 0.01 0 0 0 0 0.01 + 0 C:\test\WORDS>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Re: Fast wordlist lookup for game
by grantm (Parson) on Oct 30, 2002 at 20:17 UTC | |
by BrowserUk (Patriarch) on Oct 30, 2002 at 22:29 UTC | |
by grantm (Parson) on Oct 30, 2002 at 23:29 UTC |