Re^2: Write large array to file, very slow

The ">>" mode was used precisely because the file is constantly being reopened. But with your ++proposition, the preceding unlink can be removed, to let ">" overwrite the file instead.

Besides, the 3 args version of open with a scalar can be used for many reasons (elegance, safety ...) but at the very least for consistency with the way the input files are opened.

my $output_file = "mergedlogs.txt";
open my $output_fh, ">", $output_file or die "Can't open $output_file:
+ $!";

{
  local $| = 0;
  local $\ = "\n"; # Automatically append \n
  foreach (@mergedlogs)
  {
    print $output_fh $_; # "$_\n" copies $_ into a new string before a
+ppending \n
  }
}

close $output_fh;
[download]

Although Corion's proposition to write the result straight to the file, without the @mergedlogs intermediate variable is probably a good idea as well.

Comment on Re^2: Write large array to file, very slow Download Code

Replies are listed 'Best First'.
Re^3: Write large array to file, very slow by hippo (Archbishop) on Aug 20, 2018 at 15:24 UTC
Just ran a quick bench and your suggestion to avoid the copy (++) works very well - it's about twice as fast as my code above. As another test I also tried `local $, = "\n"; print $output_fh @mergedlogs;` but that's no faster to within statistical noise. I'll run a longer bench later just to see if it's at all significant.	[reply] [d/l]
Re^4: Write large array to file, very slow by Eily (Monsignor) on Aug 20, 2018 at 16:14 UTC
"Twice as fast" seems like a lot for in memory operations when there are also disk accesses. I'm sure there are plenty of things to consider (HW, and data size), but with the following code I couldn't get past a difference of around ~5% (although I did notice that trying that with supidly big files made my computer crash :P): use v5.20; use strict; use warnings; use Benchmark qw( cmpthese ); use Data::Dump qw( pp ); my $size = 10; my $length = 1E6; my @data = ('X' x $length, ) x $size; sub write_copy { open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $\| = 0; my $data = shift; for (@$data) { print $fh "$_\n"; } } sub write_simple { local $\ = "\n"; open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $\| = 0; my $data = shift; for (@$data) { print $fh $_; } } cmpthese( -15, { copy => sub { write_copy(\@data); }, simple => sub { write_simple(\@data); }, } ); __END__ Rate copy simple copy 27.3/s -- -5% simple 28.8/s 5% -- [download]	[reply] [d/l]
Re^5: Write large array to file, very slow by hippo (Archbishop) on Aug 20, 2018 at 18:11 UTC
"Twice as fast" seems like a lot for in memory operations when there are also disk accesses. Yes, I thought so too. Looks like my data set was so large it ate into swap. :-) Re-running with a smaller data set still shows quite a decent speed up, however. Here's my bench and results: #!/usr/bin/env perl use strict; use warnings; use Benchmark 'cmpthese'; my $size = 50_000_000; my @big = (rand () x $size); cmpthese (10, { 'interp' => 'interp ()', 'Eily' => 'eily ()', 'OFS' => 'ofs ()', }); exit; sub interp { open FH, '>', 'mergedlogs.txt' or die "can't open mergedlogs.txt: +$!"; local $\| = 0; foreach (@big) { print FH "$_\n"; } close FH; } sub eily { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $\| = 0; local $\ = "\n"; foreach (@big) { print $output_fh $_; } close $output_fh; } sub ofs { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $\| = 0; local $\ = "\n"; local $, = "\n"; print $output_fh @big; close $output_fh; } [download] `s/iter interp Eily OFS interp 1.83 -- -35% -35% Eily 1.20 53% -- -1% OFS 1.19 54% 1% --` [download]	[reply] [d/l] [select]
Re^6: Write large array to file, very slow by Eily (Monsignor) on Aug 21, 2018 at 08:29 UTC
Re^7: Write large array to file, very slow by hippo (Archbishop) on Aug 21, 2018 at 08:52 UTC