well, according to the benchmark: #!/usr/bin/perl -w
use strict;
use Benchmark;
my $x = "a b c d e f g";
sub list_context {
my $y = (split(/\s+/, $x))[0];
}
sub extra_argument {
my $y = (split(/\s+/, $x, 2))[0];
}
timethese(-3, {
"LIST CONTEXT" => \&list_context,
"EXTRA ARGUMENT" => \&extra_argument,
});
[ed@darkness ed]$ perl ./splittest.pl
Benchmark: running EXTRA ARGUMENT, LIST CONTEXT, each for at least 3 C
+PU seconds...
EXTRA ARGUMENT: 2 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU)
+@ 115563.69/s (n=362870)
LIST CONTEXT: 3 wallclock secs ( 3.18 usr + 0.00 sys = 3.18 CPU) @
+57053.46/s (n=181430)
the extra argument version of split kicks the living crap out of not using it...
so i guess, yes ;) it does make a difference! | [reply] [d/l] |
It does, here are some numbers:
Six: 10 wallclock secs ( 4.82 usr + 5.18 sys = 10.00 CPU) @ 43
+7.30/s (n=4373)
One: 13 wallclock secs (12.92 usr + 0.01 sys = 12.93 CPU) @ 4
+.87/s (n=63)
Five: 13 wallclock secs (13.39 usr + 0.00 sys = 13.39 CPU) @ 5
+.15/s (n=69)
Four: 10 wallclock secs ( 4.74 usr + 5.56 sys = 10.30 CPU) @ 44
+3.69/s (n=4570)
The code for this is:
#!/usr/bin/perl -w
use strict;
use Benchmark;
my $testlarge = "a " x 100000;
my $testsmall = "a b c d e f";
timethese(-10,{
One => sub { my ($y) = (split(/\s+/,$testlarge))[0]; },
Two => sub { my ($y) = (split(/\s+/,$testsmall))[0]; },
Three => sub { $testsmall =~ /([^\s]*)\s+/; my $y = $1; },
Four => sub { $testlarge =~ /([^\s]*)\s+/; my $y = $1; },
Five => sub { my $y = split(/\s+/,$testlarge) },
Six => sub { my ($y) = split(/\s+/,$testlarge,1) }
});
Note that using an equivalent regex is slightly faster, but the split done properly (using the third arguement) preforms at the "correct" level.
Thanks for submitting the correct answer :) hehe.
Chow,
Gryn | [reply] [d/l] [select] |
my ($y) = split " ", "a b", 1;
print $y;
will print a b, and not a.
If you want to use only the first field, and use a third
argument, just use:
my ($y) = split " ", $string, 2;
That's right. No indexing required. But even the limit isn't
required. Just the simple:
my ($y) = split " ", $string;
will do. And because it is so simple, Perl can optimize that.
Here's a benchmark program (there are brackets where indexing
is used - for some reason, perlmonks strip them), and the
results:
#!/opt/perl/bin/perl -w
use strict;
use Benchmark;
my $str = "a " x 6;
timethese -100 => {
index => sub {my ($y) = (split " " => $str) [0]},
regex => sub {my ($y) = $str =~ /(\S+)/},
limit => sub {my ($y) = (split " " => $str, 2) [0]},
plain => sub {my ($y) = split " " => $str},
}
__END__
Benchmark: running index, limit, plain, regex, each for at least 100 CPU seconds...
index: 125 wallclock secs (105.53 usr + 0.00 sys = 105.53 CPU) @ 34487.35/s (n=3639450)
regex: 121 wallclock secs (105.06 usr + 0.00 sys = 105.06 CPU) @ 43695.61/s (n=4590661)
limit: 123 wallclock secs (104.03 usr + 0.02 sys = 104.05 CPU) @ 48699.04/s (n=5067135)
plain: 120 wallclock secs (105.18 usr + 0.02 sys = 105.20 CPU) @ 52044.32/s (n=5475062)
The bottom line is, if you want Perl to do the optimizing,
keep your code simple.
-- Abigail
| [reply] |
| [reply] |
Thanks, I had mistyped my numbers (I was on a old 15" monitor where the font is set to like 3 pixels high and even 640x480 looked fuzzy, yeck).
Anyway, I noticed that there was some descripency between the relative speeds of regex versus split (that is, split used properly). And I wanted to see why, so first I added a few more tests:
Four => sub { $testlarge =~ /([^\s]*)\s+/; my $y = $1; },
Eight => sub { my $y = $testlarge =~ /(\S+)/; },
Six => sub { my ($y) = split(/\s+/, $testlarge,2) },
Seven => sub { my ($y) = split(/\s+/, $testlarge) }
I noticed that your regex was different, so I wanted to see if was why things were slower (however, I didn't think so, since your's was simplier).
Running with string equal to "a " x 100 000 , I got these numbers:
Eight: 11 wallclock secs ( 4.68 usr + 5.46 sys = 10.14 CPU) @ 45
+0.69/s (n=4570)
Four: 10 wallclock secs ( 4.79 usr + 5.21 sys = 10.00 CPU) @ 44
+9.60/s (n=4496)
Seven: 10 wallclock secs ( 7.28 usr + 2.84 sys = 10.12 CPU) @ 29
+3.97/s (n=2975)
Six: 10 wallclock secs ( 7.13 usr + 2.88 sys = 10.01 CPU) @ 29
+3.81/s (n=2941)
Which were (despite using the same regex) were still 50% faster than split, rather than being 40% slower.
Next I reduced the size of my string to: "a " x 100 . Here I got these numbers:
Eight: 12 wallclock secs (10.00 usr + 0.00 sys = 10.00 CPU) @ 13
+0970.90/s (n=1309709)
Four: 11 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 88
+839.36/s (n=923041)
Seven: 10 wallclock secs (10.49 usr + 0.00 sys = 10.49 CPU) @ 12
+2499.33/s (n=1285018)
Six: 10 wallclock secs (10.54 usr + 0.00 sys = 10.54 CPU) @ 12
+1918.22/s (n=1285018)
Now the regex code (yours) leads by less than 10%, and my regex trails by a good 30%. So, I guess the conclusion is that regex preforms better than split on large scalars? I don't feel like mucking in the perl source code right now, so my guess as to why this is, has nothing to do with the way regex's or splits actually process the data, but rather that split is probably receiving a copy of the data, whereas regex is receiving a reference.
Cheers,
Gryn | [reply] [d/l] [select] |