comment on

Thanks, I had mistyped my numbers (I was on a old 15" monitor where the font is set to like 3 pixels high and even 640x480 looked fuzzy, yeck).

Anyway, I noticed that there was some descripency between the relative speeds of regex versus split (that is, split used properly). And I wanted to see why, so first I added a few more tests:

        Four  => sub { $testlarge =~ /([^\s]*)\s+/; my $y = $1; },
        Eight => sub { my $y = $testlarge =~ /(\S+)/; },
        Six   => sub { my ($y) =  split(/\s+/, $testlarge,2)    },
        Seven => sub { my ($y) =  split(/\s+/, $testlarge)      }
[download]

I noticed that your regex was different, so I wanted to see if was why things were slower (however, I didn't think so, since your's was simplier).

Running with string equal to "a " x 100 000 , I got these numbers:

     Eight: 11 wallclock secs ( 4.68 usr +  5.46 sys = 10.14 CPU) @ 45
+0.69/s (n=4570)
      Four: 10 wallclock secs ( 4.79 usr +  5.21 sys = 10.00 CPU) @ 44
+9.60/s (n=4496)
     Seven: 10 wallclock secs ( 7.28 usr +  2.84 sys = 10.12 CPU) @ 29
+3.97/s (n=2975)
       Six: 10 wallclock secs ( 7.13 usr +  2.88 sys = 10.01 CPU) @ 29
+3.81/s (n=2941)
[download]

Which were (despite using the same regex) were still 50% faster than split, rather than being 40% slower.
Next I reduced the size of my string to: "a " x 100 . Here I got these numbers:

     Eight: 12 wallclock secs (10.00 usr +  0.00 sys = 10.00 CPU) @ 13
+0970.90/s (n=1309709)
      Four: 11 wallclock secs (10.39 usr +  0.00 sys = 10.39 CPU) @ 88
+839.36/s (n=923041)
     Seven: 10 wallclock secs (10.49 usr +  0.00 sys = 10.49 CPU) @ 12
+2499.33/s (n=1285018)
       Six: 10 wallclock secs (10.54 usr +  0.00 sys = 10.54 CPU) @ 12
+1918.22/s (n=1285018)
[download]

Now the regex code (yours) leads by less than 10%, and my regex trails by a good 30%. So, I guess the conclusion is that regex preforms better than split on large scalars? I don't feel like mucking in the perl source code right now, so my guess as to why this is, has nothing to do with the way regex's or splits actually process the data, but rather that split is probably receiving a copy of the data, whereas regex is receiving a reference.

Cheers,
Gryn

In reply to More benchmarks and stuff by gryng
in thread is split optimized? by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.