in reply to Using indexing for faster lookup in large file

My first thought would be to put the data into an actual database (even something like DBD::SQLite), then query it (but that does not appear to be a route you are looking into).

Is the data in the file sorted on the first number in the row? If so, then one method you might consider is to take a portion of that number (for example, the first 5 digits), and use the tell() function to record the first and last line matching that portion of the number. Then, you would enter the number you are seeking, the "index" lookup would tell your script where to start/stop looking, and it would seek() to the starting location of the main file, then examine the records only to the stop location. (This might work with unsorted records as well, although the degenerate case would be that the first and last entry in the main file matched the search criteria.)

Hope that helps.

  • Comment on Re: Using indexing for faster lookup in large file

Replies are listed 'Best First'.
Re^2: Using indexing for faster lookup in large file
by anli_ (Novice) on Feb 27, 2015 at 23:00 UTC
    Hi. thanks for your reply. Turning it into a actual database would be an option, and perhaps the best route to go here. I'm basically looking for something that will both solve this problem, but that also works as a general method for these kinds of issues, as I run into them quite frequently.

    The data is sorted lexicographically on the first number, but this could of course be changed to numerical sorting if that would help. I will look into the tell() and seek() functions, as I'm not familiar with them.