in reply to pipes and return data

Not related to your question, but my pet peeve is code that proceesses data element by element, when it could be handled all at once.

There's no need to process your output line by line. If you print an array on it's own:

print PROG @array

the array elements are printed without any separator. If this is a CGI script, separators are ignored by the browser, anyway, so that should be the fastest solution. But, for some odd reason, it is actually quite slow.

If it isn't HTML, and it isn't convenient to add newline characters when the array is generated, you'll need to introduce the newlines at the print. If you have an array in a double-quoted strring, the array elements are printed separated by the value of the variable $". So temporarily re-defined $" to be newline, and use the array in a string ... the string will need a newline at the end.

{ local $" = "\n"; print PROG "@array\n"; }

That's equivalent to using join to convert the array into a string, and equally fast.

print PROG join( "\n", @array), "\n";

That makes me think the underlying code is very much identical. Don't forget that Perl built-ins ( and natively coded module routines ) are fast, while re-implementing equivalent constructs in perl is slower.

I timed the options, one hundred iterations of each with an array of one million elements. join and local are 6 times as fast as manual looping and 3 times as fast as printing the array outside a string.

use Benchmark; for ( 1..1000000 ) { push @a, "$i"; } open PROG, ">/dev/nul"; timethese( 100], { manual => sub {for $line ( @a ){ print PROG "$line\n";}}, join => sub {print PROG join( "\n", @a ), "\n" }, local => sub {local $"="\n"; print PROG "@a\n";} html => sub {print PROG @a;} }); ##### output -> $ perl t.pl Benchmark: timing 100 iterations of join, local, manual... html:133 wallclock secs (131.79 usr+ 0.15 sys = 131.94 CPU) @ 0.76 +/s (n=100) join: 45 wallclock secs (44.04 usr + 0.01 sys = 44.05 CPU) @ 2.27/ +s (n=100) local: 45 wallclock secs (44.51 usr + 0.00 sys = 44.51 CPU) @ 2.25/ +s (n=100) manual:275 wallclock secs (274.17 usr+ 0.04 sys = 274.21 CPU) @ 0.36 +/s (n=100)

--
TTTATCGGTCGTTATATAGATGTTTGCA

Replies are listed 'Best First'.
Re: Re: pipes and return data
by sauoq (Abbot) on Nov 02, 2003 at 05:48 UTC

    Please, if you are going to post a benchmark and results, do so responsibly. The code you posted has (at least) two errors:

    timethese( 100], {

    Where'd that spurious ']' come from? I realize this was probably a cut-n-paste error, but still...

    open PROG, ">/dev/nul";

    This one is more serious. It implies that you tested printing your strings to a non-existent file handle. Ooops.

    Benchmark: timing 100 iterations of join, local, manual...

    Wait... what about 'html'? So, I guess you ran that later and added it into your results?

    I probably would have missed all of this had I not been suspicious of your results. I would have guessed the 'html' version to be the fastest by far... so I had to check for myself. Here are the results I got, after correcting the errors noted above:

    Frankly, I still don't know how you managed to get such poor figures for the 'html' code.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Re: pipes and return data
by Anonymous Monk on Nov 02, 2003 at 04:08 UTC
    That's pretty cool. I know mypost is semi-meaningless (technically, at least), by I really appreciate your input. I was not aware of this perf. issue (nor the local trick). thanks Tomd, me