in reply to fast lookups in files

citromatik:

A couple of methods you could try:

In either case, you'll probably want to refer to the functions:

perldoc -f seek

perldoc -f tell

...roboticus

Replies are listed 'Best First'.
Re^2: fast lookups in files
by citromatik (Curate) on Feb 05, 2008 at 12:54 UTC
    Scan the file and build a hash of (key,file position) pairs so you can immediately seek to the line you want.

    Well, if a hash of key => values doesn't fit into memory I 'm afraid that a hash with keys => positions will not fit either.

    Alternatively, sort the file and use a binary search to locate the line. This way, you'd need roughly log2(line_count) seek/reads to get the data you're looking for.

    That would be a better solution (the file is already sorted)

    Thanks

    citromatik
      You could have an array where subscript N indicates key N thousand, and the value of the array element would be the disk offset to look at.

      Let's assume you have fixed length records of 30 bytes. Element 21 having value 999990 would mean the first key greater than or equal to 21,000 was stored at disk offset 999,990.

      Having said all that, I'd also suggest using a binary search.