Re^3: performance of File Parsing

What’s killing you, then, is “that enormous hash.” You need to replace that logic.

If you were to plot the throughput of this program, it would describe a nice, exponential curve. When it reaches the “thrash point,” it smashes into the wall and drops dead. That’s my blindfolded prediction, but I’ll bet I’m right on the money.

I suggest stuffing the whole thing into an SQLite database (flat-file), and using queries (within transactions).

“Don’t ‘diddle’ the code to make it faster ... find a better algorithm.”
– Kernighan & Plauger; The Elements of Programming Style.