Re: is split optimized?

Replies are listed 'Best First'.
RE: Re: is split optimized? by eduardo (Curate) on Jul 14, 2000 at 06:37 UTC
well, according to the benchmark: #!/usr/bin/perl -w use strict; use Benchmark; my $x = "a b c d e f g"; sub list_context { my $y = (split(/\s+/, $x))[0]; } sub extra_argument { my $y = (split(/\s+/, $x, 2))[0]; } timethese(-3, { "LIST CONTEXT" => \&list_context, "EXTRA ARGUMENT" => \&extra_argument, }); [ed@darkness ed]$ perl ./splittest.pl Benchmark: running EXTRA ARGUMENT, LIST CONTEXT, each for at least 3 C +PU seconds... EXTRA ARGUMENT: 2 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) +@ 115563.69/s (n=362870) LIST CONTEXT: 3 wallclock secs ( 3.18 usr + 0.00 sys = 3.18 CPU) @ +57053.46/s (n=181430) [download] the extra argument version of split kicks the living crap out of not using it... so i guess, yes ;) it does make a difference!	[reply] [d/l]
Using the third arguement for split by gryng (Hermit) on Jul 14, 2000 at 06:34 UTC
It does, here are some numbers: `Six: 10 wallclock secs ( 4.82 usr + 5.18 sys = 10.00 CPU) @ 43 +7.30/s (n=4373) One: 13 wallclock secs (12.92 usr + 0.01 sys = 12.93 CPU) @ 4 +.87/s (n=63) Five: 13 wallclock secs (13.39 usr + 0.00 sys = 13.39 CPU) @ 5 +.15/s (n=69) Four: 10 wallclock secs ( 4.74 usr + 5.56 sys = 10.30 CPU) @ 44 +3.69/s (n=4570)` [download] The code for this is: `#!/usr/bin/perl -w use strict; use Benchmark; my $testlarge = "a " x 100000; my $testsmall = "a b c d e f"; timethese(-10,{ One => sub { my ($y) = (split(/\s+/,$testlarge))[0]; }, Two => sub { my ($y) = (split(/\s+/,$testsmall))[0]; }, Three => sub { $testsmall =~ /([^\s])\s+/; my $y = $1; }, Four => sub { $testlarge =~ /([^\s])\s+/; my $y = $1; }, Five => sub { my $y = split(/\s+/,$testlarge) }, Six => sub { my ($y) = split(/\s+/,$testlarge,1) } });` [download] Note that using an equivalent regex is slightly faster, but the split done properly (using the third arguement) preforms at the "correct" level. Thanks for submitting the correct answer :) hehe. Chow, Gryn	[reply] [d/l] [select]
RE: Using the third arguement for split by Abigail (Deacon) on Jul 14, 2000 at 10:09 UTC
Thanks for submitting the correct answer :) hehe. But you used it in an incorrect way. If the third argument is 1, it's effectively a noop. The third argument does not mean to discard everything after the first field. my ($y) = split " ", "a b", 1; print $y; will print `a b`, and not `a`. If you want to use only the first field, and use a third argument, just use: my ($y) = split " ", $string, 2; That's right. No indexing required. But even the limit isn't required. Just the simple: my ($y) = split " ", $string; will do. And because it is so simple, Perl can optimize that. Here's a benchmark program (there are brackets where indexing is used - for some reason, perlmonks strip them), and the results: #!/opt/perl/bin/perl -w use strict; use Benchmark; my $str = "a " x 6; timethese -100 => { index => sub {my ($y) = (split " " => $str) [0]}, regex => sub {my ($y) = $str =~ /(\S+)/}, limit => sub {my ($y) = (split " " => $str, 2) [0]}, plain => sub {my ($y) = split " " => $str}, } __END__ Benchmark: running index, limit, plain, regex, each for at least 100 CPU seconds... index: 125 wallclock secs (105.53 usr + 0.00 sys = 105.53 CPU) @ 34487.35/s (n=3639450) regex: 121 wallclock secs (105.06 usr + 0.00 sys = 105.06 CPU) @ 43695.61/s (n=4590661) limit: 123 wallclock secs (104.03 usr + 0.02 sys = 104.05 CPU) @ 48699.04/s (n=5067135) plain: 120 wallclock secs (105.18 usr + 0.02 sys = 105.20 CPU) @ 52044.32/s (n=5475062) The bottom line is, if you want Perl to do the optimizing, keep your code simple. -- Abigail	[reply]
Brackets by DrManhattan (Chaplain) on Jul 14, 2000 at 18:20 UTC
You can post code with brackets by enclosing it in a <CODE> </CODE> block. See the Site How To for more information. - Matt	[reply]
RE: Brackets by Abigail (Deacon) on Jul 15, 2000 at 00:18 UTC
More benchmarks and stuff by gryng (Hermit) on Jul 14, 2000 at 18:18 UTC
Thanks, I had mistyped my numbers (I was on a old 15" monitor where the font is set to like 3 pixels high and even 640x480 looked fuzzy, yeck). Anyway, I noticed that there was some descripency between the relative speeds of regex versus split (that is, split used properly). And I wanted to see why, so first I added a few more tests: `Four => sub { $testlarge =~ /([^\s])\s+/; my $y = $1; }, Eight => sub { my $y = $testlarge =~ /(\S+)/; }, Six => sub { my ($y) = split(/\s+/, $testlarge,2) }, Seven => sub { my ($y) = split(/\s+/, $testlarge) }` [download] I noticed that your regex was different, so I wanted to see if was why things were slower (however, I didn't think so, since your's was simplier). Running with string equal to "a " x 100 000 , I got these numbers: `Eight: 11 wallclock secs ( 4.68 usr + 5.46 sys = 10.14 CPU) @ 45 +0.69/s (n=4570) Four: 10 wallclock secs ( 4.79 usr + 5.21 sys = 10.00 CPU) @ 44 +9.60/s (n=4496) Seven: 10 wallclock secs ( 7.28 usr + 2.84 sys = 10.12 CPU) @ 29 +3.97/s (n=2975) Six: 10 wallclock secs ( 7.13 usr + 2.88 sys = 10.01 CPU) @ 29 +3.81/s (n=2941)` [download] Which were (despite using the same regex) were still 50% faster than split, rather than being 40% slower. Next I reduced the size of my string to: "a " x 100 . Here I got these numbers: `Eight: 12 wallclock secs (10.00 usr + 0.00 sys = 10.00 CPU) @ 13 +0970.90/s (n=1309709) Four: 11 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 88 +839.36/s (n=923041) Seven: 10 wallclock secs (10.49 usr + 0.00 sys = 10.49 CPU) @ 12 +2499.33/s (n=1285018) Six: 10 wallclock secs (10.54 usr + 0.00 sys = 10.54 CPU) @ 12 +1918.22/s (n=1285018)` [download] Now the regex code (yours) leads by less than 10%, and my regex trails by a good 30%. So, I guess the conclusion is that regex preforms better than split on large scalars? I don't feel like mucking in the perl source code right now, so my guess* as to why this is, has nothing to do with the way regex's or splits actually process the data, but rather that split is probably receiving a copy of the data, whereas regex is receiving a reference. Cheers, Gryn	[reply] [d/l] [select]
RE: More benchmarks and stuff by Abigail (Deacon) on Jul 15, 2000 at 00:15 UTC
The conclusion of: More benchmarks and stuff by gryng (Hermit) on Jul 15, 2000 at 02:25 UTC