comment on

G'day Mano,

Welcome to the Monstery.

In general, calling print 20,000 times with individual records will be slower than calling it once with all records.

I ran the following Benchmark several times.

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

use constant {
    LINES   => 20_000,
    RECORD  => 'X' x 100 . "\n",
}; 

use Benchmark 'cmpthese';

open my $fh, '>>', '/dev/null';

cmpthese 0 => {
    singly => sub {
        print $fh RECORD for 1 .. LINES;
    },
    concat => sub {
        print $fh join '', (RECORD) x LINES;
    },
    list => sub {
        print $fh +(RECORD) x LINES;
    },
    string => sub {
        print $fh RECORD x LINES;
    },
};
[download]

Here's a representative result:

         Rate singly   list concat string
singly  437/s     --   -64%   -71%   -91%
list   1205/s   176%     --   -20%   -74%
concat 1497/s   243%    24%     --   -68%
string 4720/s   981%   292%   215%     --
[download]

You didn't give any indication of record size (error messages can vary wildly in length): I just used 100 'X's (plus a newline). If that's a reasonable guess, I don't imagine you'd have any problem with ~2MB of data (either holding it in memory or passing it to print).

As you can see, printing every record singly was slower than the other methods. A single print with concatenated records appears a little faster than using a list; however, that wasn't the case in all runs: I'd consider these too close to call. Also bear in mind that, because I've used constant values, Perl may have performed some optimisations at compile time. Consider what other code is involved as you capture records and add them to a string or use them to populate an array.

There are some other factors to take into consideration. Is this a one-off run? If not, how frequently is it run? How long does the entire process take to run? Is it being run by multiple processes at the same time? Are there other users on the system? How might this affect them?

Although printing records individually may be slower in the benchmark scenario I present, if done correctly, this method should have a substantially smaller memory footprint. In addition, spreading the printing tasks over the life of the process, may mean it plays more nicely with other, concurrent processes.

There's a fair amount to think about. I'd recommend writing your own benchmark, using more representative data, and running it in an environment that's closer to one in which the code will actually be run.

— Ken

In reply to Re: Performance In Perl by kcott
in thread Performance In Perl by Mano_Man

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.