Re: Efficient way to handle huge number of records?

If you’ve got the memory, and it is uncontested memory (i.e. swapping will not occur), then your problem is indeed an easy one: “throw silicon at it,” and be done. Nothing except the CPU itself is faster than silicon. If you don’t have enough memory (but, you do have a 64-bit machine), well, how much do those chips cost?

Obviously, there is time spent loading all that data into memory, which may or may not be desirable. (It really depends on how much of the data you expect to process at any one time. It would be a one-time cost per run.)

Another possibility is to use (say...) an SQLite database file purely as an index to the file, storing the starting-position of the interesting bits as integer offsets from the beginning of the file. Use the index to find what you’re looking for. Also consider sorting this list of offsets (in memory, of course) into ascending order so that the hardware can zip right through the file sequentially from front to back, seeking to always-forward positions as needed. May not make a difference, but it might.

It is easy to get too-clever and to spend a lot of time and effort implementing things that really do not matter in the end; that sound cool in theory but that really do not speed things up vis à vis the time spent writing and debugging it. A simple stopwatch comes in handy. It may well be that you keep a hash of what you’re looking for and the program reads the entire file each time (or until it has found everything), and, even though the run time is “larger than it might be,” it is consistent.

Replies are listed 'Best First'.
Re^2: Efficient way to handle huge number of records? by Anonymous Monk on Dec 13, 2011 at 03:56 UTC
P.S. SQLite is very efficient and very fast .. but pay close attention to "transactions" when using it. If a transaction is under way it's very clever about lazy-writing; but if not it verifies every single disk write. (As it is designed to do.) Makes a huge difference in speed when writing. Nice thing about SQLite (aside from the fact that it seems to be pretty much bulletproof) is that "a database = a file, nothing more or less." Maybe just the ticket for building (any number of) handy indexes into your file.	[reply]

Replies are listed 'Best First'.

Re^2: Efficient way to handle huge number of records?
by Anonymous Monk on Dec 13, 2011 at 03:56 UTC

P.S. SQLite is very efficient and very fast .. but pay close attention to "transactions" when using it. If a transaction is under way it's very clever about lazy-writing; but if not it verifies every single disk write. (As it is designed to do.) Makes a huge difference in speed when writing. Nice thing about SQLite (aside from the fact that it seems to be pretty much bulletproof) is that "a database = a file, nothing more or less." Maybe just the ticket for building (any number of) handy indexes into your file.

[reply]