in reply to how to split huge file reading into multiple threads
My intuitive guess, based on your task-description, is that your algorithm is probably memory-based, and what is therefore actually happening is “classic thrashing.” In this case, threads won’t help at all.
Consider ways to use disk-based sorting to manage the files. Or, put the data into an SQLite database (disk file...) and use its indexing and querying capability. The bottom line is ... don’t do anything “in memory.” That means: no hashes, no lists, no “potentially big things in memory” at all.
An appropriate redesign should not blink at all at “millions of records.” But we do know that the classic performance-curve caused by thrashing is ... not linear, but exponential ... degradation. When you say, “2+ hours,” that’s what it fairly screams to me.
Easy test: fire up the program and use a separate system monitor to watch the swap I/O rate, and the percentage of time spent in page faults. If it is, as I suspect it will be, “huge,” then there’s your answer.
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: how to split huge file reading into multiple threads
by onelesd (Pilgrim) on Aug 23, 2011 at 17:54 UTC | |
by sagarika (Novice) on Aug 30, 2011 at 10:22 UTC | |
by onelesd (Pilgrim) on Aug 30, 2011 at 18:23 UTC | |
|
Re^2: how to split huge file reading into multiple threads
by sagarika (Novice) on Aug 30, 2011 at 10:19 UTC |