Re: fast lookups in files

in reply to fast lookups in files

A couple of methods you could try:

Scan the file and build a hash of (key,file position) pairs so you can immediately seek to the line you want. However, this requires a complete scan through the file (the first time).
Alternatively, sort the file and use a binary search to locate the line. This way, you'd need roughly log2(line_count) seek/reads to get the data you're looking for.

In either case, you'll probably want to refer to the functions:

perldoc -f seek

perldoc -f tell

Comment on Re: fast lookups in files

Replies are listed 'Best First'.
Re^2: fast lookups in files by citromatik (Curate) on Feb 05, 2008 at 12:54 UTC
Scan the file and build a hash of (key,file position) pairs so you can immediately seek to the line you want. Well, if a hash of key => values doesn't fit into memory I 'm afraid that a hash with keys => positions will not fit either. Alternatively, sort the file and use a binary search to locate the line. This way, you'd need roughly log2(line_count) seek/reads to get the data you're looking for. That would be a better solution (the file is already sorted) Thanks citromatik	[reply]
Re^3: fast lookups in files by apl (Monsignor) on Feb 05, 2008 at 13:35 UTC
You could have an array where subscript N indicates key N thousand, and the value of the array element would be the disk offset to look at. Let's assume you have fixed length records of 30 bytes. Element 21 having value 999990 would mean the first key greater than or equal to 21,000 was stored at disk offset 999,990. Having said all that, I'd also suggest using a binary search.	[reply]

In Section Seekers of Perl Wisdom