As I said I would last night, I went back and adapted my to original programs to use 3-character indexing instead of two, to bring it into line with yours and benefit from the smaller linear search.

The result, using my original 26 "worst case" words, is a reduction in average search time by nearly an order of magnitude from 20ms/word to 2.7ms/word, which is kind of pleasing.

If you can make your 10k test file available to me, or we can agree on a suitable publically available test file, we could try comparing apples with apples:^).

I do love it when someone makes me work for my one of my boasts!

Indexer code

#! perl -sw use strict; my ($oldfirst, $oldc2n3) = ( '', '' ); open WORDS, '<', $ARGV[0] or die $ARGV[0] . $1 . $/; while(<WORDS>) { chomp; $_ = lc; my ($first, $c2n3) = /^(.)(..?)?.*/; $first = './' . $first; $c2n3 = substr($c2n3.'__',0,2) if !defined $c2n3 or length($c2n3) +< 2; $c2n3 = $first . '/' . $c2n3; if ( ($first ne $oldfirst) or ($c2n3 ne $oldc2n3) ) { mkdir $first if not -e $first; open SUB, '>>', $c2n3 or die $c2n3 . $! . $/; } print SUB $_, $/; ($oldfirst, $oldc2n3) = ($first, $c2n3); print "\r$c2n3:$_ "; }

Wordcheck test program

#! perl -sw use strict; use Benchmark::Timer; sub IsWord { return if not @_ or length == 1; my ($word, $first, $c2n3) = $_[0] =~ /^((.)(..?)?.*)$/; local $_ = do{ local (@ARGV,$/) = "./$first/$c2n3"; <> }; return /^$word$/im; } my $timer = Benchmark::Timer->new(skip=>0); for (@ARGV) { $timer->start; my $IsWord = IsWord( $_ ); $timer->stop; print $_, $IsWord ? ' is ' : ' is not ', 'a word', $/; } $timer->report; print "@{[$timer->data('_default')]}", $/; __END__

Results and 'benchmark'.

C:\test\WORDS>../208899-2 Anonymous Baltic Confirm Delta Entity Font G +regarious Hazzard Indigo Jalopy Kilo Lascivious Matter Notorious Ovul +ation Prosthetic Quinine Reading Semaphore Transendental Uniform Vivisection Weather Xenophobia Yesterday Zorro Anonymous is a word Baltic is a word Confirm is a word Delta is a word Entity is a word Font is a word Gregarious is a word Hazzard is a word Indigo is a word Jalopy is a word Kilo is a word Lascivious is a word Matter is a word Notorious is a word Ovulation is a word Prosthetic is a word Quinine is a word Reading is a word Semaphore is a word Transendental is not a word Uniform is a word Vivisection is a word Weather is a word Xenophobia is a word Yesterday is a word Zorro is a word 26 trials of _default (70ms total), 2.692ms/trial 0 0 0 0 0 0.01 0 0 0.01 0 0 0 0 0 0 0.01 0.01 0 0.01 0.01 0 0 0 0 0.01 + 0 C:\test\WORDS>

Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy

In reply to Re: Re: Re: Fast wordlist lookup for game by BrowserUk
in thread Fast wordlist lookup for game by jerrygarciuh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.