in reply to 15 billion row text file and row deletes - Best Practice?

nice thread. I would treat the file as a database

a unix solution would be

  • cut the kill file in k chunks
  • fast string matching on serial file via grep
  • grep -nF -f chunk serial_file > delete_file # you need k steps!
  • perl gets line numbers n from delete_file, for each n susbstitute in serial the line by "X" x length $_

    if latter you add entries put it at the first X line that fits or else append

    (update) if you need to repeat the process a few times, maybe it's worth to sort the serial file

    • Comment on Re: 15 billion row text file and row deletes - Best Practice?