in reply to Re^3: 15 billion row text file and row deletes - Best Practice?
in thread 15 billion row text file and row deletes - Best Practice?
But how many serials do you have to delete? You don't need to hold the master file all in a hash or database at once, if the delete-list is small enough to fit into memory. Iterate over the input file one line at a time. Each line, check the delete-hash to see if this is a line that you need to eliminate. If it is, next;, otherwise, print to your new output file. Move on to the next line... lather, rinse, repeat.
If practical, hold the delete list in an in-memory hash. If it's not practical to do so, hold the delete list in a database. But leave the master list in a flat file.
Dave
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: 15 billion row text file and row deletes - Best Practice?
by jhourcle (Prior) on Dec 01, 2006 at 15:14 UTC | |
by djp (Hermit) on Dec 04, 2006 at 02:36 UTC |