shivapm has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am new in perl. I need to know what is the best approach to modify a text file(>15GB).I need to search and delete some data(row wise).Waiting for suggestions.text file is like

abc|xhagc|cgac|hchgc

Here I need to update to search the first column key and delete the row in very less time.

Replies are listed 'Best First'.
Re: File modification (Updated.)
by BrowserUk (Patriarch) on Mar 07, 2016 at 12:02 UTC

    Try something like this; it out performs most other methods I've seen. You might want to try adjusting BUFSIZE. 64k works best on my system, but your's may differ.

    Updated: Corrected handling of partial buffer.

    #! perl -slw use strict; use constant BUFSIZE => 65536; chomp( my $searchTerm = $ARGV[ 0 ] // 'five' ); open BIGFILE, '+<', 'theFile' or die $!; my $readPos = my $writePos = 0; my $buffer = ''; while( <BIGFILE> ) { $readPos = tell BIGFILE; if( ! m[^\Q$searchTerm\E] ) { $buffer .= $_; if( length( $buffer ) > BUFSIZE ) { sysseek BIGFILE, $writePos, 0; syswrite BIGFILE, $buffer; $writePos = sysseek BIGFILE, 0, 1; $buffer = ''; seek BIGFILE, $readPos, 0; } } } if( length( $buffer ) ){ sysseek BIGFILE, $writePos, 0; syswrite BIGFILE, $buffer; $writePos = sysseek BIGFILE, 0, 1; } truncate BIGFILE, $writePos; close BIGFILE;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: File modification
by FreeBeerReekingMonk (Deacon) on Mar 08, 2016 at 08:38 UTC
    Question:

    Do you have long, and different rows:

    searchme1|data1|more|auxiliary|data|that|is|very|long searchmetoo|and|some|parametric|data|that|can|be|discarded thirdsearch|and|the|file|looks|like|this?

    Where the first item after a separator "|" is the search term you need to capture and ignore the rest?

    And somehow, this does not work or takes too long due to the long lines:

    perl -ne 'print unless /search_this_text/' lines.txt

    Final question: What is the "modification" you need on that file? Mark a certain field, make the line start with #, remove the full line from the big file? (i.e. Will you require the same space, or will filesize change after such edit?)