Re: Rapid text searches

OK, let me describe how I see your problem, you have consecutive IDs in two columns and your looking for the lines where a special ID occurs in one of the columns?

I assume it's more complicated than in your example and gaps are possible, so you can't just trivially calculate the right lines to look for?

And a binary search is either no option?

With a strict ordering, it's like a telephone book, no need to start reading from the first page on, first look for the right city and the starting letters.

I would build up a hash of hashes for indexing ranges of line numbers for each column.

with 1101077781160 you'll look up $hash1{11010} where you get a second hash to lookup $hash2{77781} telling you the range to search for "160".

The firstlevel hash (the city) should have a size thats easily kept in RAM, the second level hashes should be loaded on demand (and some - maybe the last hundred - kept cached).

Of course you could have more levels of hashes and I'm not sure about the best way to make hashes persistent and quickly loaded, but this are CPAN-details.

Cheers Rolf

Comment on Re: Rapid text searches