Re: 15 billion row text file and row deletes

If you have to work with the text file then I would recommend using sed instead of perl.

# If your version of sed supports editing in place
sed -i -e '/^00020123837$/d' somefile.txt

#Otherwise
sed -e '/^00020123837$/d' somefile.txt > tmp.txt
mv tmp.txt somefile.txt
[download]

If this is done regularly or other maintenance work is going to be done then a database becomes a much more attractive option.

Comment on Re: 15 billion row text file and row deletes - Best Practice? Download Code

Replies are listed 'Best First'.
Re^2: 15 billion row text file and row deletes - Best Practice? by graff (Chancellor) on Dec 01, 2006 at 06:08 UTC
And bear in mind that sed supports the use of an "edit script" file -- one could take a list of patterns that should be deleted from a file, and turn that into an edit script. Based on the OP's description, the list of serial numbers to kill could be saved as: `/0001234/d /0004567/d /0089123/d ...` [download] If that's in a file called "kill.list", then just run sed like this: `sed -f kill.list big.file > tmp.copy mv tmp.copy big.file` [download] On a file of a few hundred GB, I agree that using sed for this very simple sort of editing would be a significant win (save a lot of run time).	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: 15 billion row text file and row deletes - Best Practice?
by graff (Chancellor) on Dec 01, 2006 at 06:08 UTC

/0001234/d
/0004567/d
/0089123/d
...
[download]

sed -f kill.list big.file > tmp.copy
mv tmp.copy big.file
[download]

[reply]
[d/l]
[select]