in reply to Table Generation vs. Flat File vs. DBM

how is the data currently being generated? obviously the speed with only improve the difference between however it's being generated now and reading it off the disk.

in general, using DBM is easy, i just played with it the other day and got it working within an hour of being told i needed to work with it (and having never used it in perl before)

perl -e"\$_=qq/nwdd\x7F^n\x7Flm{{llql0}qs\x14/;s/./chr(ord$&^30)/ge;print"

  • Comment on Re: Table Generation vs. Flat File vs. DBM

Replies are listed 'Best First'.
Re: Re: Table Generation vs. Flat File vs. DBM
by mhearse (Chaplain) on May 05, 2004 at 06:19 UTC
    The large tables are generated with map and grep loops. Basically making many combinations of a few things. I guess I will try dumping the static values in a flat file and DBM file, then benchmark it both ways. I guess my question now "is opening and reading a flat file faster that opening and reading a DBM file. Is there any performance benefits with one or the other?"
      YMMV, depending on which flavor of DBM you pick and how your hardware gets along with it. As a general rule, writing to any sort of DBM file tends to be somewhat more expensive than writing a flat file, in terms of overall space consumed, amount of actual disk i/o performed, and total cpu time required.

      But when reading data back after you've stored it, a DBM file is vastly better, especially when fetching values in a quasi-random fashion from a very large set -- or at least, whenever the fetching order is very different from the storage order. In such cases, doing repeated sequential searches over a flat file will kill you, whereas the DBM file is really just a big hash array on disk, optimized to deliver any chosen piece of data in a consistently short amount of time.

      So the question really is "what sort of access do you really need when reading the data back?" If you can easily write a flat file such that you just need to read it back once from beginning to end, then a flat file will be the better choice.