in reply to Re: Re: Re: Fast wordlist lookup for game
in thread Fast wordlist lookup for game

I've updated my original post to include the code I used to generate the index. So you can use that with your dictionary word list.

The 10,000 word document I used in my benchmark test was the output of `perldoc perltoot`.

I'm definitely interested in your results, unfortunately I've got some real work to do so I can't play right now. The main reason I posted a DBM solution was that many people are unaware of this middle ground between storing your data in a flat text file or using a full-blown RDBMS solution. One advantage is the shallow learning curve - if you can use a hash then with 3 more lines of code (use, tie, untie) then you can use a DBM solution. Another advantage is the blazing speed - the various DBM solutions are coded in highly optimised 'C' and use efficient indexing algorithms that can perform better than more general solutions used in RDBMS products.

I also had a rethink about one tie vs one tie per lookup. Assuming that the final solution was going to run under mod_perl, you probably would tie once when the script was first invoked. Kind of like using a persistent database connection with DBI. It would only be safe to keep the DBM file tied if it was not being written to. For the performance boost it would give, it might be worth batching any required updates and using the DBM file in readonly mode.

  • Comment on Re: Re: Re: Re: Fast wordlist lookup for game

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Fast wordlist lookup for game
by BrowserUk (Patriarch) on Oct 30, 2002 at 22:29 UTC

    With a couple of changes to my program, I ran it on the output of perldoc perltoot and got the following results:

    C:\test>208899-2 junk.dat Of 10479 word-like things, 7803 where flagged as words and 2676 as not 10479 trials of _default (58.184s total), 5.552ms/trial

    Which if my ad-hoc conversion factor of *233/800 has any merit puts it at 1.61702ms/word which stands up pretty well given the that it has to re-open the file each time.

    Not having used any of the *dbm varients, I was impressed by the efficiency and simplicity of use that your program demonstrated, and when I looked, found that dada's site has a Win32 port of GDBM which I am in the process of downloading now.

    Thanks greatly for bringing this to my attention. There are many things for which I think I will find this extrememly useful.


    Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy

      Further thoughts:

      • I suspect there's not too much merit in your *233/800 factor - those numbers are just raw processor Mhz numbers, many cycles will be wasted waiting for main memory and man, many more if disk IO is required.
      • You could also use DB_File which is available via PPM from ActiveState and I believe uses the same API
      • SDBM_File is a core perl module which also uses the same API, but has significant limitations on allowable size of data
      • See also MLDBM which allows you to store complex data structures in a DBM hash