pwagyi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I have a perl script to do some minor modification to content of large file. The problem is modification needed is at beginning of file. and since the file is quite huge (>2GB), it can sometimes take very long just to copy rest of remaining data. Is there any way to make it almost as fast as copying (say using cp command)?

while( data = read_data(fh) ) { if( data_to_modify(data) ) { out_fh->write(modified_data); last; # break out } } while( read(fh,buffer,buffer_size) ) { out_fh->write(buffer); }

Replies are listed 'Best First'.
Re: Modifying small part of large file
by thanos1983 (Parson) on Dec 29, 2017 at 11:22 UTC

    Hello pwagyi

    I am not really sure, if the lexical names that you use e.g. read_data, data_to_modify, out_fh->write are your functions or they come from the use of a module. Since we do not have this information I will assume that you are using your functions etc.

    What are you trying to do exactly? I mean are you trying to simply read and write to a file or there is also something else happening? I see from your code that you read and write to the file twice why? Also why are you using last instead of next? I would suggest to use defined to avoid having problems with lines after chomp. Also why not skip the blank lines in your file that could also boost your reading process (in case there are empty lines).

    Sample of code:

    #!/usr/bin/perl use strict; use warnings; use feature 'say'; my $file = 'file.txt'; open(my $fh, "<", $file) or die "Can't open ".$file." error: $!"; while( defined( my $line = <$fh> ) ) { chomp $line; next if $line =~ /^\s*$/; # skip empty lines next if (index($line, 'second') != -1); say $line; } close $fh or warn "Can't close ".$file." error: $!"; __DATA__ This is the first line in the file. This is the second line in the file. This is the third line in the file. This is the forth line in the file. __OUTPUT__ $ perl test.pl This is the first line in the file. This is the third line in the file. This is the forth line in the file.

    If you want to read and write to a file have you tried Tie::File? Sample of code using IO::All module which includes the methods of Tie::File.

    #!/usr/bin/perl use strict; use IO::All; use warnings; use Data::Dumper; use feature 'say'; my $io = io 'file.txt'; # Miscellaneous: my @lines = $io->chomp->slurp; print Dumper \@lines; # Tie::File support: $io->[2] = 'This is the changed line in the file.'; # Chan +ge a line say $io->[@$io / 2]; # Print middle line print Dumper \@lines; __DATA__ This is the first line in the file. This is the second line in the file. This is the third line in the file. This is the forth line in the file. __OUTPUT__ $ perl test.pl $VAR1 = [ 'This is the first line in the file.', 'This is the second line in the file.', 'This is the third line in the file.', 'This is the forth line in the file.' ]; This is the changed line in the file. $VAR1 = [ 'This is the first line in the file.', 'This is the second line in the file.', 'This is the changed line in the file.', 'This is the forth line in the file.' ];

    Update: I noticed that maybe you are working with binary files. So maybe my sample of code is not applicable in your case. Let's assume for the moment that you are working with binaries. Again my question why since you are altering a file why you need to copy and paste it after again, since the original file is updated to your new version? If for any reason you want to create a copy of the altered file and you think that cp is faster than copying the whole file through Perl, why you do not apply it through the command line? You can use Capture::Tiny for this job.

    Sample of code:

    #!/usr/bin/perl use strict; use warnings; use Capture::Tiny 'capture'; my $original = 'file.txt'; my $copy = 'fileTest.txt'; my $cmd = qq{cp -v $original $copy}; # capture from external command my ($stdout, $stderr, $exitCode) = capture { system( $cmd ); }; print 'StdOut: ' . $stdout if $exitCode == 0; print 'Error: ' . $stderr unless $exitCode == 0; __END__ $ perl test.pl StdOut: 'file.txt' -> 'fileTest.txt'

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Modifying small part of large file
by Anonymous Monk on Dec 29, 2017 at 07:45 UTC
    What does "modify" mean exactly? Are you merely flipping bits or are you adding bytes?
      'Modify' as in adding/removing bytes. Output file might be bigger or smaller.