Strange indeed. I get similar results, although with smaller differences. However, if I change the tests (but not the regexes or the data) slightly, I do get the results where unpack wins:
#! perl -slw
use strict;
use Benchmark qw [cmpthese];
our @data = map{
join' ', '2004-05-13', '14:02:00', ('blah') x (1 + rand (9))
} 1 .. 1000;
our (@greedy, @explicit, @unpack);
cmpthese (-1, {
greedy => '@greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @data
+',
explicit => '@explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s
(\d{2}:\d{2}:\d{2})\s(.*$)/x} @data
+',
unpack => '@unpack = map {unpack "A10 x A8 x A*" => $_} @data
+',
});
die unless "@greedy" eq "@explicit" &&
"@greedy" eq "@unpack";
__END__
Rate explicit greedy unpack
explicit 86.1/s -- -6% -25%
greedy 91.6/s 6% -- -20%
unpack 114/s 33% 25% --
Abigail | [reply] [d/l] |
Curiouser and curiouser. The change of tests, changes the relative performance of the methods, but it also slows all of them down.
I thought that using globals instead of lexicals might have been part of the difference, and it is, but only a small part.
#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];
our $TEST ||= 0;
our $N = $TEST ? 10 : $N || 1000;
our @data = map{ join' ', '2004-05-13', '14:02:00', ('blah') x (1+rand
+( 9 )) } 1 .. $N;
our (@greedy, @explicit, @unpack);
cmpthese( $TEST ? 1 : -1, {
our_g => '@greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @data'
+,
our_e => '@explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s
(\d{2}:\d{2}:\d{2})\s(.*$)/x} @data
+',
our_u => '@unpack = map {unpack "A10 x A8 x A*" => $_} @data'
+,
my_g => 'my @greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @da
+ta',
my_e => 'my @explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s
(\d{2}:\d{2}:\d{2})\s(.*$)/x} @data
+',
my_u => 'my @unpack = map {unpack "A10 x A8 x A*" => $_} @da
+ta',
greedy => q[
my( $date, $time, $text );
m[(^\S*)\s(\S*)\s(.*$)]
and ( $date, $time, $text ) = ( $1, $2, $3 )
# and $TEST and print "greedy: $date|$time|$text"
for @data;
],
explicit => q[
my( $date, $time, $text );
m[(^\d{4}\-\d{2}\-\d{2})\s(\d{2}:\d{2}:\d{2})\s(.*$)]
and ( $date, $time, $text ) = ( $1, $2, $3 )
# and $TEST and print "explicit: $date|$time|$text"
for @data;
],
unpackA => q[
use bytes;
my( $date, $time, $text );
( $date, $time, $text ) = unpack 'A10 x A8 x A*', $_
# and $TEST and print "unpackA: $date|$time|$text"
for @data;
],
substr => q[
use bytes;
my( $date, $time, $text );
( $date, $time, $text ) =
(
substr( $_, 0, 10 ),
substr( $_, 11, 8 ),
substr( $_, 20 )
)
# and $TEST and print "substr: $date|$time|$text"
for @data;
],
});
__END__
P:\test>362106
Rate our_e our_g our_u my_e my_g my_u unpackA substr expli
+cit greedy
our_e 72.4/s -- -2% -15% -28% -30% -43% -55% -73% -
+77% -79%
our_g 73.6/s 2% -- -13% -27% -29% -42% -54% -73% -
+77% -79%
our_u 85.0/s 17% 15% -- -16% -18% -33% -47% -69% -
+73% -75%
my_e 101/s 39% 37% 19% -- -3% -20% -37% -63% -
+68% -71%
my_g 104/s 43% 41% 22% 3% -- -18% -35% -62% -
+67% -70%
my_u 126/s 74% 71% 48% 25% 21% -- -21% -53% -
+60% -64%
unpackA 160/s 121% 117% 88% 59% 54% 27% -- -41% -
+49% -54%
substr 270/s 273% 267% 218% 168% 160% 114% 69% -- -
+14% -22%
explicit 314/s 334% 327% 270% 212% 203% 149% 96% 16%
+ -- -9%
greedy 346/s 378% 370% 307% 243% 234% 175% 116% 28%
+10% --
It would be interesting to see the benchmark run on 5.6.2 (pre-unicodification), which I don't have installed currently.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |
Rate our_e our_g my_e explicit my_g greedy our_u unpackA m
+y_u substr
our_e 126/s -- -9% -20% -25% -28% -33% -37% -49% -
+54% -56%
our_g 139/s 10% -- -11% -17% -21% -26% -30% -44% -
+49% -52%
my_e 158/s 25% 13% -- -6% -10% -17% -21% -36% -
+42% -45%
explicit 168/s 33% 20% 6% -- -5% -12% -16% -32% -
+38% -42%
my_g 175/s 39% 26% 11% 5% -- -7% -12% -29% -
+36% -39%
greedy 189/s 50% 36% 20% 13% 8% -- -5% -24% -
+30% -34%
our_u 199/s 58% 43% 26% 19% 13% 5% -- -20% -
+27% -31%
unpackA 248/s 96% 78% 57% 48% 41% 31% 25% --
+-9% -14%
my_u 272/s 116% 95% 73% 63% 55% 44% 37% 10%
+ -- -5%
substr 288/s 128% 106% 83% 72% 64% 52% 45% 16%
+ 6% --
And the same computer with perl 5.8.3 is _much_ slower.
Rate our_e our_g my_e our_u my_g explicit unpackA greedy m
+y_u substr
our_e 61.0/s -- -9% -25% -29% -33% -41% -48% -48% -
+52% -67%
our_g 67.3/s 10% -- -17% -21% -27% -35% -42% -43% -
+47% -64%
my_e 80.7/s 32% 20% -- -5% -12% -22% -31% -32% -
+36% -57%
our_u 85.3/s 40% 27% 6% -- -7% -18% -27% -28% -
+32% -54%
my_g 91.6/s 50% 36% 13% 7% -- -12% -21% -23% -
+28% -51%
explicit 104/s 70% 54% 28% 22% 13% -- -11% -12% -
+18% -44%
unpackA 116/s 91% 73% 44% 36% 27% 12% -- -2%
+-8% -37%
greedy 118/s 94% 76% 47% 39% 29% 14% 2% --
+-6% -36%
my_u 126/s 107% 88% 57% 48% 38% 22% 9% 7%
+ -- -32%
substr 186/s 205% 176% 130% 118% 103% 79% 60% 57%
+47% --
| [reply] [d/l] [select] |
I was going to suggest substr() (which is around twice as fast as any of these to extract just these three pieces of data, perfectly formatted) - until I saw the real regex being used. I can't imagine substr, unpack or any rigidly formatted extraction method is any use at all for that lot. | [reply] [d/l] [select] |
The thing that really queers the pitch is the quoted field 2/3rds of the way down each line. Without that, you could use the magic form of split and it would (probably) be quicker. As it is, I can't see any way of improving the performance over the long, but actually quite fast regex.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] |