in reply to Speeding up data lookups

Using mmap will give you the same memory sharing effect as reading everything in one process then forking, as well as being easier to manage, somewhat more efficient to read, and making it easier for your OS to manage the memory (data is only paged in as its needed, and the system's buffer cache can just drop pages if it needs the memory, and get it back from the disk later).

Another possibility is to steal an idea from dbz, which indexes a fixed format text file by using the index values as the keys, and the file offset of the start of the record as values. This is fairly straightforward to do with Berkeley DB. A few well-chosen indexes can make a huge difference, without requiring a full rewrite into a relational database.

So if you used mmap and a dbz-style index together, you'd get the file offset from the DB, then use substr to inspect the mmap'd data at that location.

Replies are listed 'Best First'.
Re^2: Speeding up data lookups
by suaveant (Parson) on Sep 19, 2005 at 19:31 UTC
    The way the data is currently stored there is only ever one key... the security identifier. mmap would probably make sense, though that is only C, yes? Or is there a way to do it in perl....

    If only C, then I would probably leave it as an option to look at if all Perl fails.

                    - Ant
                    - Some of my best work - (1 2 3)

      As graff says, using mmap from Perl is straightforward using the Mmap module, which will map the file to a Perl string. You do have to be careful how you access the data; some operations will cause Perl to make a copy of the string, which is a bit of a problem with a 1GB string. substr is pretty safe, and probably most other things as long as you're careful not to write anything to the string.

      I wrote up a sample grep implementation using mmap here: Re: anyone have some good mmap examples?

      MMap

      You'll probably find a few nodes at PM that discuss this module.