Re: DB_File/BerkeleyDB with large datafiles

Given this bit in your snippet of code:

    ...
    tie %database, ...
    ...
    $database{$h->{id}} = $dump;
    ...
[download]

It would seem that your intention/requirement is simply to fetch a chunk of data by means of a look-up key/id string. (That is, you don't need to search on the basis of the data content associated with the key/id.)

If that's true -- and if it's also true that the data content linked to each key/id remains static -- then it might suffice to have the full (2+GB) data file in whatever form is most convenient generally (assuming this is read-only), and build a separate index (e.g. with Berkeley DB) in which each key/id string is associated with the byte offsets for the associated data chunk in the big file.

It would be quick and easy to do one pass over the big file to create a separate listing of key/id strings with the byte-offsets of their associated records (e.g. start_byte, n_bytes). Then build a DB_File (or equivalent) index of the keys and byte-offsets (instead of keys and data blocks).

Depending on how many records you have, a full hash of key+byte_offset pairs might even fit in ram...

Comment on Re: DB_File/BerkeleyDB with large datafiles Download Code

Replies are listed 'Best First'.
Re^2: DB_File/BerkeleyDB with large datafiles by Anonymous Monk on Sep 21, 2010 at 16:08 UTC
Actually I think I will give it a try. I was aware of the solution but never tried something like this before. Have to look up how to do it but guess it will be the best solution. Thanks	[reply]