in reply to Re^2: Write large array to file, very slow
in thread Write large array to file, very slow

Just ran a quick bench and your suggestion to avoid the copy (++) works very well - it's about twice as fast as my code above. As another test I also tried local $, = "\n"; print $output_fh @mergedlogs; but that's no faster to within statistical noise. I'll run a longer bench later just to see if it's at all significant.

Replies are listed 'Best First'.
Re^4: Write large array to file, very slow
by Eily (Monsignor) on Aug 20, 2018 at 16:14 UTC

    "Twice as fast" seems like a lot for in memory operations when there are also disk accesses. I'm sure there are plenty of things to consider (HW, and data size), but with the following code I couldn't get past a difference of around ~5% (although I did notice that trying that with supidly big files made my computer crash :P):

    use v5.20; use strict; use warnings; use Benchmark qw( cmpthese ); use Data::Dump qw( pp ); my $size = 10; my $length = 1E6; my @data = ('X' x $length, ) x $size; sub write_copy { open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $| = 0; my $data = shift; for (@$data) { print $fh "$_\n"; } } sub write_simple { local $\ = "\n"; open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $| = 0; my $data = shift; for (@$data) { print $fh $_; } } cmpthese( -15, { copy => sub { write_copy(\@data); }, simple => sub { write_simple(\@data); }, } ); __END__ Rate copy simple copy 27.3/s -- -5% simple 28.8/s 5% --

      "Twice as fast" seems like a lot for in memory operations when there are also disk accesses.

      Yes, I thought so too. Looks like my data set was so large it ate into swap. :-)

      Re-running with a smaller data set still shows quite a decent speed up, however. Here's my bench and results:

      #!/usr/bin/env perl use strict; use warnings; use Benchmark 'cmpthese'; my $size = 50_000_000; my @big = (rand () x $size); cmpthese (10, { 'interp' => 'interp ()', 'Eily' => 'eily ()', 'OFS' => 'ofs ()', }); exit; sub interp { open FH, '>', 'mergedlogs.txt' or die "can't open mergedlogs.txt: +$!"; local $| = 0; foreach (@big) { print FH "$_\n"; } close FH; } sub eily { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $| = 0; local $\ = "\n"; foreach (@big) { print $output_fh $_; } close $output_fh; } sub ofs { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $| = 0; local $\ = "\n"; local $, = "\n"; print $output_fh @big; close $output_fh; }
      s/iter interp Eily OFS interp 1.83 -- -35% -35% Eily 1.20 53% -- -1% OFS 1.19 54% 1% --

        I get this (perl v5.26.2):

        s/iter interp Eily OFS interp 2.98 -- -5% -7% Eily 2.83 5% -- -2% OFS 2.77 8% 2% --

        my @big = (rand () x $size); What do you expect @big to contain after this though? It looks like you wanted to make an array of random numbers. But rand is only called once so you just have one repeated value. Also x is tricky (I'd even say un-perl-like) because it depends on the operands in a way that no other operator in perl does. So the first thing I did was check how many elements are in @big: 1, with 50 000 000 copies of the random value. This means that you are just writing one item and neither the for loop nor the use of $, have much of an effect (if at all) here. So finding a significant difference between Eily and OFS would have been worrying.