Re^2: fast greedy regex

I had the same thought, but unless I am doing something dumb (quite likely:), then strangely it seems that unpack is slower than even the explicit regex?

#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];

our @data = map{ 
    join' ', '2004-05-13', '14:02:00', ('blah') x (1+rand( 9 )) 
} 1 .. 1000;

cmpthese( -1, {
    greedy => q[
        my( $date, $time, $text );
        
        m[(^\S*)\s(\S*)\s(.*$)]
            and ( $date, $time, $text ) = ( $1, $2, $3 )
#            and print "greedy: $date|$time|$text"
            for @data;
    ],
    explicit => q[
        my( $date, $time, $text );
        
        m[(^\d{4}\-\d{2}\-\d{2})\s(\d{2}:\d{2}:\d{2})\s(.*$)]
            and ( $date, $time, $text ) = ( $1, $2, $3 )
#            and print "explicit: $date|$time|$text"
            for @data;
    ],
    unpack => q[
        my( $date, $time, $text );

        ( $date, $time, $text ) = unpack 'A10 x A8 x A*', $_
#            and print "unpack: $date|$time|$text"
            for @data;
    ],
});
    
__END__
P:\test>362106
          Rate   unpack explicit   greedy
unpack   158/s       --     -41%     -53%
explicit 267/s      70%       --     -21%
greedy   338/s     114%      26%       --
[download]

What stupidity am I guilty of?

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Comment on Re^2: fast greedy regex Download Code

Replies are listed 'Best First'.
Re: fast greedy regex by Abigail-II (Bishop) on Jun 07, 2004 at 22:42 UTC
Strange indeed. I get similar results, although with smaller differences. However, if I change the tests (but not the regexes or the data) slightly, I do get the results where unpack wins: #! perl -slw use strict; use Benchmark qw [cmpthese]; our @data = map{ join' ', '2004-05-13', '14:02:00', ('blah') x (1 + rand (9)) } 1 .. 1000; our (@greedy, @explicit, @unpack); cmpthese (-1, { greedy => '@greedy = map {/(^\S)\s(\S)\s(.$)/} @data +', explicit => '@explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.$)/x} @data +', unpack => '@unpack = map {unpack "A10 x A8 x A*" => $_} @data +', }); die unless "@greedy" eq "@explicit" && "@greedy" eq "@unpack"; __END__ Rate explicit greedy unpack explicit 86.1/s -- -6% -25% greedy 91.6/s 6% -- -20% unpack 114/s 33% 25% -- [download] Abigail	[reply] [d/l]
Re^2: fast greedy regex by BrowserUk (Patriarch) on Jun 08, 2004 at 00:02 UTC
Curiouser and curiouser. The change of tests, changes the relative performance of the methods, but it also slows all of them down. I thought that using globals instead of lexicals might have been part of the difference, and it is, but only a small part. #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $TEST \|\|= 0; our $N = $TEST ? 10 : $N \|\| 1000; our @data = map{ join' ', '2004-05-13', '14:02:00', ('blah') x (1+rand +( 9 )) } 1 .. $N; our (@greedy, @explicit, @unpack); cmpthese( $TEST ? 1 : -1, { our_g => '@greedy = map {/(^\S)\s(\S)\s(.$)/} @data' +, our_e => '@explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.$)/x} @data +', our_u => '@unpack = map {unpack "A10 x A8 x A" => $_} @data' +, my_g => 'my @greedy = map {/(^\S)\s(\S)\s(.$)/} @da +ta', my_e => 'my @explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.$)/x} @data +', my_u => 'my @unpack = map {unpack "A10 x A8 x A" => $_} @da +ta', greedy => q[ my( $date, $time, $text ); m[(^\S)\s(\S)\s(.$)] and ( $date, $time, $text ) = ( $1, $2, $3 ) # and $TEST and print "greedy: $date\|$time\|$text" for @data; ], explicit => q[ my( $date, $time, $text ); m[(^\d{4}\-\d{2}\-\d{2})\s(\d{2}:\d{2}:\d{2})\s(.$)] and ( $date, $time, $text ) = ( $1, $2, $3 ) # and $TEST and print "explicit: $date\|$time\|$text" for @data; ], unpackA => q[ use bytes; my( $date, $time, $text ); ( $date, $time, $text ) = unpack 'A10 x A8 x A*', $_ # and $TEST and print "unpackA: $date\|$time\|$text" for @data; ], substr => q[ use bytes; my( $date, $time, $text ); ( $date, $time, $text ) = ( substr( $_, 0, 10 ), substr( $_, 11, 8 ), substr( $_, 20 ) ) # and $TEST and print "substr: $date\|$time\|$text" for @data; ], }); __END__ P:\test>362106 Rate our_e our_g our_u my_e my_g my_u unpackA substr expli +cit greedy our_e 72.4/s -- -2% -15% -28% -30% -43% -55% -73% - +77% -79% our_g 73.6/s 2% -- -13% -27% -29% -42% -54% -73% - +77% -79% our_u 85.0/s 17% 15% -- -16% -18% -33% -47% -69% - +73% -75% my_e 101/s 39% 37% 19% -- -3% -20% -37% -63% - +68% -71% my_g 104/s 43% 41% 22% 3% -- -18% -35% -62% - +67% -70% my_u 126/s 74% 71% 48% 25% 21% -- -21% -53% - +60% -64% unpackA 160/s 121% 117% 88% 59% 54% 27% -- -41% - +49% -54% substr 270/s 273% 267% 218% 168% 160% 114% 69% -- - +14% -22% explicit 314/s 334% 327% 270% 212% 203% 149% 96% 16% + -- -9% greedy 346/s 378% 370% 307% 243% 234% 175% 116% 28% +10% -- [download] It would be interesting to see the benchmark run on 5.6.2 (pre-unicodification), which I don't have installed currently. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re^3: fast greedy regex by borisz (Canon) on Jun 08, 2004 at 00:31 UTC
This is perl 5.6.1 Rate our_e our_g my_e explicit my_g greedy our_u unpackA m +y_u substr our_e 126/s -- -9% -20% -25% -28% -33% -37% -49% - +54% -56% our_g 139/s 10% -- -11% -17% -21% -26% -30% -44% - +49% -52% my_e 158/s 25% 13% -- -6% -10% -17% -21% -36% - +42% -45% explicit 168/s 33% 20% 6% -- -5% -12% -16% -32% - +38% -42% my_g 175/s 39% 26% 11% 5% -- -7% -12% -29% - +36% -39% greedy 189/s 50% 36% 20% 13% 8% -- -5% -24% - +30% -34% our_u 199/s 58% 43% 26% 19% 13% 5% -- -20% - +27% -31% unpackA 248/s 96% 78% 57% 48% 41% 31% 25% -- +-9% -14% my_u 272/s 116% 95% 73% 63% 55% 44% 37% 10% + -- -5% substr 288/s 128% 106% 83% 72% 64% 52% 45% 16% + 6% -- [download] And the same computer with perl 5.8.3 is _much_ slower. Rate our_e our_g my_e our_u my_g explicit unpackA greedy m +y_u substr our_e 61.0/s -- -9% -25% -29% -33% -41% -48% -48% - +52% -67% our_g 67.3/s 10% -- -17% -21% -27% -35% -42% -43% - +47% -64% my_e 80.7/s 32% 20% -- -5% -12% -22% -31% -32% - +36% -57% our_u 85.3/s 40% 27% 6% -- -7% -18% -27% -28% - +32% -54% my_g 91.6/s 50% 36% 13% 7% -- -12% -21% -23% - +28% -51% explicit 104/s 70% 54% 28% 22% 13% -- -11% -12% - +18% -44% unpackA 116/s 91% 73% 44% 36% 27% 12% -- -2% +-8% -37% greedy 118/s 94% 76% 47% 39% 29% 14% 2% -- +-6% -36% my_u 126/s 107% 88% 57% 48% 38% 22% 9% 7% + -- -32% substr 186/s 205% 176% 130% 118% 103% 79% 60% 57% +47% -- [download] Boris	[reply] [d/l] [select]
Re^4: fast greedy regex by BrowserUk (Patriarch) on Jun 08, 2004 at 01:20 UTC
Re^3: fast greedy regex by Anonymous Monk on Jun 08, 2004 at 09:20 UTC
I was going to suggest `substr()` (which is around twice as fast as any of these to extract just these three pieces of data, perfectly formatted) - until I saw the real regex being used. I can't imagine `substr`, `unpack` or any rigidly formatted extraction method is any use at all for that lot.	[reply] [d/l] [select]
Re^4: fast greedy regex by BrowserUk (Patriarch) on Jun 08, 2004 at 09:38 UTC
The thing that really queers the pitch is the quoted field 2/3rds of the way down each line. Without that, you could use the magic form of split and it would (probably) be quicker. As it is, I can't see any way of improving the performance over the long, but actually quite fast regex. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply]