in reply to 15 billion row text file and row deletes - Best Practice?
How many serials are in the "file that has a list of serials to delete"?
If it is a relatively small number, you could read them into a hash. Then you could read through the 15 billion line file line-by-line (thereby avoiding the need to keep the whole thing in memory at once); if the line's serial is in the "delete" hash then read the next line, otherwise print it to a new output file.
You'll need enough drive space to accommodate the new output file, of course, but it would accomplish the goal without using a db and with only minimal memory requirements, and it would still require only a single pass through the 15 billion line file.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: 15 billion row text file and row deletes - Best Practice?
by friedo (Prior) on Dec 01, 2006 at 05:24 UTC | |
by bobf (Monsignor) on Dec 01, 2006 at 05:35 UTC | |
by awohld (Hermit) on Dec 01, 2006 at 05:29 UTC | |
by davido (Cardinal) on Dec 01, 2006 at 06:03 UTC | |
by jhourcle (Prior) on Dec 01, 2006 at 15:14 UTC | |
by djp (Hermit) on Dec 04, 2006 at 02:36 UTC |