comment on

mea culpa, fellow monk, but I must disagree with your answer. Yes, the benchmarks say something but it is what they say I feel needs deeper interpretation.

As Russ said, split is incredibly well optimized. Most of the perl internals are. There have been many C coders of wonderous talent pouring over the code to make it so. You code demonstrates that the AM was not using the correct tool, which is an answer to an unasked question.

Your four pieces of code are doing radically different things. The regex is stopping after the first match, while the split must work the entire string. Until you compare apples to apples, no conclusion can be drawn. Let us run this test and do it correctly. Note the slight changes I made to the regex code. That should result in a better comparison.

#!/usr/local/bin/perl -w
use strict;
use Benchmark;

my $testlarge = "a " x 100000;
my $testsmall = "a b c d e f";
timethese(-10,{
      One   => sub { my ($y) = (split(/\s+/,$testlarge))[0]; },
      Two   => sub { my ($y) = (split(/\s+/,$testsmall))[0]; },
      Three => sub { my $y = ( $testsmall =~ (/([^\s]*)\s+/g))[0]; },
      Four  => sub { my $y = ( $testlarge =~ (/([^\s]*)\s+/g))[0]; },
            });
mik@mach5:/home/mik/monk)./benchthis.pl
Benchmark: running Four, One, Three, Two, each for at least 10 CPU sec
+onds...
      Four: 19 wallclock secs (18.38 usr +  0.02 sys = 18.40 CPU) @  1
+.14/s (n=21)
       One: 13 wallclock secs (12.72 usr +  0.00 sys = 12.72 CPU) @  2
+.12/s (n=27)
     Three: 12 wallclock secs (10.28 usr +  0.00 sys = 10.28 CPU) @ 12
+748.55/s (n=131071)
       Two: 11 wallclock secs (10.00 usr +  0.00 sys = 10.00 CPU) @ 18
+600.60/s (n=186006)
[download]

When comparing apples to apples, it seems split is highly optimized. This more an issue of choosing the right tool for the job at hand.

This rant brought to you by
mikfire

In reply to RE: Re: is split optimized? by mikfire
in thread is split optimized? by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.