in reply to 15 billion row text file and row deletes - Best Practice?

If you have to work with the text file then I would recommend using sed instead of perl.
# If your version of sed supports editing in place sed -i -e '/^00020123837$/d' somefile.txt #Otherwise sed -e '/^00020123837$/d' somefile.txt > tmp.txt mv tmp.txt somefile.txt
If this is done regularly or other maintenance work is going to be done then a database becomes a much more attractive option.

Replies are listed 'Best First'.
Re^2: 15 billion row text file and row deletes - Best Practice?
by graff (Chancellor) on Dec 01, 2006 at 06:08 UTC
    And bear in mind that sed supports the use of an "edit script" file -- one could take a list of patterns that should be deleted from a file, and turn that into an edit script. Based on the OP's description, the list of serial numbers to kill could be saved as:
    /0001234/d /0004567/d /0089123/d ...
    If that's in a file called "kill.list", then just run sed like this:
    sed -f kill.list big.file > tmp.copy mv tmp.copy big.file
    On a file of a few hundred GB, I agree that using sed for this very simple sort of editing would be a significant win (save a lot of run time).