comment on

In this particular case, no, it would not change. "A 500% increase in performance" is a meaningless statistic without first identifying 500% of what and, in this case, that base number is too tiny to matter, especially in comparison to the time your program will need to spend reading the 700M file from disk and doing the actual processing of its contents before it's ready to print its output.

Also, while kcott's numbers above show a 981% difference (roughly 0.2ms vs 2ms for 20k lines of output, which is to say a fraction of a microsecond per line), I note that his test builds the long strings using the x operator instead of doing 20k individual concatenations. Let's see what happens if we actually build the output string line-by-line instead, as the code in your original post does it:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

use constant {
    LINES   => 20_000,
    RECORD  => 'X' x 100 . "\n",
}; 

use Benchmark 'cmpthese';

open my $fh, '>>', '/dev/null';
my $out;

cmpthese 0 => {
    kcott_by_line => sub {
        print $fh RECORD for 1 .. LINES;
    },
    kcott_concat => sub {
        print $fh RECORD x LINES;
    },
    append_per_line => sub {
      $out = '';
      $out .= RECORD for 1 .. LINES;
      print $fh $out;
    }
}
[download]

And the results:

                  Rate   kcott_by_line append_per_line    kcott_concat
kcott_by_line    716/s              --            -20%            -93%
append_per_line  900/s             26%              --            -91%
kcott_concat    9690/s           1254%            976%              --
[download]

Only a 26% difference between printing line-by-line and appending line-by-line. It seems that the primary optimization behind a single print being so much faster in kcott's test was that it built the entire output string in one operation instead of handling each line of output separately. Which is not an optimization that you would be able to apply in the case your question describes.

And, again translating this back into real numbers, the difference is 14.3 million lines/second printing them individually vs. an even 18 million lines/sec if they're concatenated first. 0.7 microseconds/line vs. 0.56 microseconds/line. A savings of approximately one second per 70 million lines of output. Over four billion lines to get a one-minute difference.

Whoopty-freaking-do.

How many times would each of those 100 users have to process their 700M input files for the aggregate difference to add up to the time you spent reading this reply, never mind the time I spent writing it?

This kind of micro-optimization is just not worth it in 99% of cases - and, for the other 1%, you'll get bigger gains by using C or a similar high-performance language instead of Perl, and then micro-optimizing the C code if you still need more speed at that point.

In reply to Re^3: Performance In Perl by dsheroh
in thread Performance In Perl by Mano_Man

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.