Re^2: Performance In Perl

There are really two questions to answer:

Is it faster to call print/say once with several megabytes of data, or many more times with small amounts of data each time?
How big a string can you build before you run out of memory

To answer the second question first, Perl has no built-in limits for string size. Try this simple test program using the splendid Devel::Size module:

use strict;
use warnings;

use Devel::Size qw(size);

my $str = '';

for (1 .. 1_000_000) {
    $str .= "x" x 80;
}

print "ASCII string has @{[ length($str) ]} characters and consumes @{
+[ size($str) ]} bytes\n";

$str = '';

for (1 .. 1_000_000) {
    $str .= "\N{U+1234}" x 80;
}

print "Unicode string has @{[ length($str) ]} characters and consumes 
+@{[ size($str) ]} bytes\n";
[download]

On my machine, the output is

ASCII string has 80000000 characters and consumes 89779352 bytes
Unicode string has 80000000 characters and consumes 273984424 bytes
[download]

So even with one million 80-character lines, you're only using a couple hundred megabytes of RAM.

To answer the I/O speed question, you can try benchmarking it like the program below:

use strict;
use warnings;
use feature "say";

use Benchmark;
use Devel::Size qw(size);

my $t0 = Benchmark->new;

my $total_bytes_chunked = 0;

for (1 .. 1_000_000) {
    my $str = 'x' x 80;
    #$total_bytes_chunked += size($str);
    say STDERR $str;
}

my $t1 = Benchmark->new;

my $str = '';

for (1 .. 1_000_000) {
    $str .= 'x' x 80 . "\n";
}

my $total_bytes_lump = 0;
#$total_bytes_lump = size($str);

print STDERR $str;

my $t2 = Benchmark->new;

say "Printing in small chunks ($total_bytes_chunked bytes): @{[ timest
+r(timediff($t1, $t0)) ]}";
say "Printing one big chunk ($total_bytes_lump bytes): @{[ timestr(tim
+ediff($t2, $t1)) ]}";
[download]

If you run it, redirect STDERR or the comparison is meaningless: perl test.pl 2>/dev/null. Be warned that it may be a false comparison nonetheless. On my machine, printing one big lump is faster than printing one million small chunks. However, if you uncomment the size() calls to see how much the total string sizes differ, you'll find the first loop suddenly takes four times longer, because it's doing a lot more calculation at each loop iteration.

Probably the only right way to answer your question is to try both in your program and with your input and see which one performs faster. It really depends on how much you can afford to keep in memory and how much computation you need to do for each individual chunk to print.

Comment on Re^2: Performance In Perl Select or Download Code

Replies are listed 'Best First'.
Re^3: Performance In Perl by Mano_Man (Acolyte) on Mar 15, 2017 at 09:45 UTC
Thank you for the speedy replies. I've just checked it - the difference is about 30% performance, which is a lot. Of course, in small prints, this is negligible. Thank you !	[reply]