ngbabu has asked for the wisdom of the Perl Monks concerning the following question:

Dear All, I am using the following code to replace certain information in binary mode.
$s=time(); open(FH, "$ARGV[0]"); open(OUT, ">$ARGV[1]"); binmode FH; binmode OUT; $/=undef; $line=<FH>; $line=~s!(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns!$1$2!gs wh +ile($line=~/(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns/gs); print OUT $line; $e=time(); $r=$e-$s; close(FH); close(OUT); print "Done...\nRuntime: $r seconds";

This is code is loading entire file content and doing the replacement. If we read line by line we can avoid the out of memory problem. But my replacement is depending on previous line. The below is the input:

224 /EuclidSymbol f1
(D) -22 673 sh
.....
320 ns
.....
221 ns

The output should be as follows:

224 /EuclidSymbol f1
(D) -22 673 sh
.....
320 /EuclidSymbol f1
.....
221 /EuclidSymbol f1

I tried with Tie::File but is not loading Binary data. Please suggest how can i solve the problem. My file size is around 3GB.

Regards,
Ganesh

Replies are listed 'Best First'.
Re: Out of Memory
by Anonymous Monk on Mar 15, 2010 at 12:13 UTC
Re: Out of Memory
by shmem (Chancellor) on Mar 15, 2010 at 22:46 UTC

    Since you can't load a file sized 3GB into memory, so you have to read it in chunks, examine what is read so far, do your substitutions and write what's processed to another file. There are many ways to do that, here is one:

    #!/usr/bin/perl my $binfile = 'binfile'; my $outfile = 'testfile'; my $re1 = qr{blorf}; # that is the regex which saves the token my $re2 = qr{foo bar}; # that is the regex for following lines open my $in, '<', $binfile or die "'$binfile': $!\n"; open my $out, '>', $outfile or die "'$outfile': $!\n"; my $buf; # empty my $found; # not yet while(! eof $in) { if ($found && /$re2/g) { # do substitution s/$re2/quux $1 $found/; } elsif (/($re1)/g) { # if we match the start of a section, # we remember what we need and write all things read # up to the position of the current match. $found = $1; my $chunk = substr $_,0,length($1)+pos; print $out $chunk; substr $_, 0, length($1)+pos,''; } else { read($in, $buf, 50); $_ .= $buf; } } print $out $_; # spit out last chunk

    I hope that's readable...

    See pos, perlre.