in reply to Write large array to file, very slow

foreach (@mergedlogs) { open FH, ">>mergedlogs.txt" or die "can't open mergedlogs.txt: $!"; print FH "$_\n"; close FH }

You are opening and closing the file on every single record. Don't do that. Instead:

open FH, ">>mergedlogs.txt" or die "can't open mergedlogs.txt: $!"; $| = 0; # just in case foreach (@mergedlogs) { print FH "$_\n"; } close FH;

There are other ways this could be improved, but this should get you a large gain for little effort.

Replies are listed 'Best First'.
Re^2: Write large array to file, very slow
by Eily (Monsignor) on Aug 20, 2018 at 14:35 UTC

    The ">>" mode was used precisely because the file is constantly being reopened. But with your ++proposition, the preceding unlink can be removed, to let ">" overwrite the file instead.

    Besides, the 3 args version of open with a scalar can be used for many reasons (elegance, safety ...) but at the very least for consistency with the way the input files are opened.

    my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_file: + $!"; { local $| = 0; local $\ = "\n"; # Automatically append \n foreach (@mergedlogs) { print $output_fh $_; # "$_\n" copies $_ into a new string before a +ppending \n } } close $output_fh;
    Although Corion's proposition to write the result straight to the file, without the @mergedlogs intermediate variable is probably a good idea as well.

      Just ran a quick bench and your suggestion to avoid the copy (++) works very well - it's about twice as fast as my code above. As another test I also tried local $, = "\n"; print $output_fh @mergedlogs; but that's no faster to within statistical noise. I'll run a longer bench later just to see if it's at all significant.

        "Twice as fast" seems like a lot for in memory operations when there are also disk accesses. I'm sure there are plenty of things to consider (HW, and data size), but with the following code I couldn't get past a difference of around ~5% (although I did notice that trying that with supidly big files made my computer crash :P):

        use v5.20; use strict; use warnings; use Benchmark qw( cmpthese ); use Data::Dump qw( pp ); my $size = 10; my $length = 1E6; my @data = ('X' x $length, ) x $size; sub write_copy { open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $| = 0; my $data = shift; for (@$data) { print $fh "$_\n"; } } sub write_simple { local $\ = "\n"; open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $| = 0; my $data = shift; for (@$data) { print $fh $_; } } cmpthese( -15, { copy => sub { write_copy(\@data); }, simple => sub { write_simple(\@data); }, } ); __END__ Rate copy simple copy 27.3/s -- -5% simple 28.8/s 5% --