There are really two questions to answer:

  1. Is it faster to call print/say once with several megabytes of data, or many more times with small amounts of data each time?
  2. How big a string can you build before you run out of memory

To answer the second question first, Perl has no built-in limits for string size. Try this simple test program using the splendid Devel::Size module:

use strict; use warnings; use Devel::Size qw(size); my $str = ''; for (1 .. 1_000_000) { $str .= "x" x 80; } print "ASCII string has @{[ length($str) ]} characters and consumes @{ +[ size($str) ]} bytes\n"; $str = ''; for (1 .. 1_000_000) { $str .= "\N{U+1234}" x 80; } print "Unicode string has @{[ length($str) ]} characters and consumes +@{[ size($str) ]} bytes\n";

On my machine, the output is

ASCII string has 80000000 characters and consumes 89779352 bytes Unicode string has 80000000 characters and consumes 273984424 bytes

So even with one million 80-character lines, you're only using a couple hundred megabytes of RAM.

To answer the I/O speed question, you can try benchmarking it like the program below:

use strict; use warnings; use feature "say"; use Benchmark; use Devel::Size qw(size); my $t0 = Benchmark->new; my $total_bytes_chunked = 0; for (1 .. 1_000_000) { my $str = 'x' x 80; #$total_bytes_chunked += size($str); say STDERR $str; } my $t1 = Benchmark->new; my $str = ''; for (1 .. 1_000_000) { $str .= 'x' x 80 . "\n"; } my $total_bytes_lump = 0; #$total_bytes_lump = size($str); print STDERR $str; my $t2 = Benchmark->new; say "Printing in small chunks ($total_bytes_chunked bytes): @{[ timest +r(timediff($t1, $t0)) ]}"; say "Printing one big chunk ($total_bytes_lump bytes): @{[ timestr(tim +ediff($t2, $t1)) ]}";

If you run it, redirect STDERR or the comparison is meaningless: perl test.pl 2>/dev/null. Be warned that it may be a false comparison nonetheless. On my machine, printing one big lump is faster than printing one million small chunks. However, if you uncomment the size() calls to see how much the total string sizes differ, you'll find the first loop suddenly takes four times longer, because it's doing a lot more calculation at each loop iteration.

Probably the only right way to answer your question is to try both in your program and with your input and see which one performs faster. It really depends on how much you can afford to keep in memory and how much computation you need to do for each individual chunk to print.


In reply to Re^2: Performance In Perl by vrk
in thread Performance In Perl by Mano_Man

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.