Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: fast greedy regex

by Abigail-II (Bishop)
on Jun 07, 2004 at 22:42 UTC ( [id://362141]=note: print w/replies, xml ) Need Help??


in reply to Re^2: fast greedy regex
in thread fast greedy regex

Strange indeed. I get similar results, although with smaller differences. However, if I change the tests (but not the regexes or the data) slightly, I do get the results where unpack wins:
#! perl -slw use strict; use Benchmark qw [cmpthese]; our @data = map{ join' ', '2004-05-13', '14:02:00', ('blah') x (1 + rand (9)) } 1 .. 1000; our (@greedy, @explicit, @unpack); cmpthese (-1, { greedy => '@greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @data +', explicit => '@explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.*$)/x} @data +', unpack => '@unpack = map {unpack "A10 x A8 x A*" => $_} @data +', }); die unless "@greedy" eq "@explicit" && "@greedy" eq "@unpack"; __END__ Rate explicit greedy unpack explicit 86.1/s -- -6% -25% greedy 91.6/s 6% -- -20% unpack 114/s 33% 25% --

Abigail

Replies are listed 'Best First'.
Re^2: fast greedy regex
by BrowserUk (Patriarch) on Jun 08, 2004 at 00:02 UTC

    Curiouser and curiouser. The change of tests, changes the relative performance of the methods, but it also slows all of them down.

    I thought that using globals instead of lexicals might have been part of the difference, and it is, but only a small part.

    #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $TEST ||= 0; our $N = $TEST ? 10 : $N || 1000; our @data = map{ join' ', '2004-05-13', '14:02:00', ('blah') x (1+rand +( 9 )) } 1 .. $N; our (@greedy, @explicit, @unpack); cmpthese( $TEST ? 1 : -1, { our_g => '@greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @data' +, our_e => '@explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.*$)/x} @data +', our_u => '@unpack = map {unpack "A10 x A8 x A*" => $_} @data' +, my_g => 'my @greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @da +ta', my_e => 'my @explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.*$)/x} @data +', my_u => 'my @unpack = map {unpack "A10 x A8 x A*" => $_} @da +ta', greedy => q[ my( $date, $time, $text ); m[(^\S*)\s(\S*)\s(.*$)] and ( $date, $time, $text ) = ( $1, $2, $3 ) # and $TEST and print "greedy: $date|$time|$text" for @data; ], explicit => q[ my( $date, $time, $text ); m[(^\d{4}\-\d{2}\-\d{2})\s(\d{2}:\d{2}:\d{2})\s(.*$)] and ( $date, $time, $text ) = ( $1, $2, $3 ) # and $TEST and print "explicit: $date|$time|$text" for @data; ], unpackA => q[ use bytes; my( $date, $time, $text ); ( $date, $time, $text ) = unpack 'A10 x A8 x A*', $_ # and $TEST and print "unpackA: $date|$time|$text" for @data; ], substr => q[ use bytes; my( $date, $time, $text ); ( $date, $time, $text ) = ( substr( $_, 0, 10 ), substr( $_, 11, 8 ), substr( $_, 20 ) ) # and $TEST and print "substr: $date|$time|$text" for @data; ], }); __END__ P:\test>362106 Rate our_e our_g our_u my_e my_g my_u unpackA substr expli +cit greedy our_e 72.4/s -- -2% -15% -28% -30% -43% -55% -73% - +77% -79% our_g 73.6/s 2% -- -13% -27% -29% -42% -54% -73% - +77% -79% our_u 85.0/s 17% 15% -- -16% -18% -33% -47% -69% - +73% -75% my_e 101/s 39% 37% 19% -- -3% -20% -37% -63% - +68% -71% my_g 104/s 43% 41% 22% 3% -- -18% -35% -62% - +67% -70% my_u 126/s 74% 71% 48% 25% 21% -- -21% -53% - +60% -64% unpackA 160/s 121% 117% 88% 59% 54% 27% -- -41% - +49% -54% substr 270/s 273% 267% 218% 168% 160% 114% 69% -- - +14% -22% explicit 314/s 334% 327% 270% 212% 203% 149% 96% 16% + -- -9% greedy 346/s 378% 370% 307% 243% 234% 175% 116% 28% +10% --

    It would be interesting to see the benchmark run on 5.6.2 (pre-unicodification), which I don't have installed currently.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      This is perl 5.6.1
      Rate our_e our_g my_e explicit my_g greedy our_u unpackA m +y_u substr our_e 126/s -- -9% -20% -25% -28% -33% -37% -49% - +54% -56% our_g 139/s 10% -- -11% -17% -21% -26% -30% -44% - +49% -52% my_e 158/s 25% 13% -- -6% -10% -17% -21% -36% - +42% -45% explicit 168/s 33% 20% 6% -- -5% -12% -16% -32% - +38% -42% my_g 175/s 39% 26% 11% 5% -- -7% -12% -29% - +36% -39% greedy 189/s 50% 36% 20% 13% 8% -- -5% -24% - +30% -34% our_u 199/s 58% 43% 26% 19% 13% 5% -- -20% - +27% -31% unpackA 248/s 96% 78% 57% 48% 41% 31% 25% -- +-9% -14% my_u 272/s 116% 95% 73% 63% 55% 44% 37% 10% + -- -5% substr 288/s 128% 106% 83% 72% 64% 52% 45% 16% + 6% --
      And the same computer with perl 5.8.3 is _much_ slower.
      Rate our_e our_g my_e our_u my_g explicit unpackA greedy m +y_u substr our_e 61.0/s -- -9% -25% -29% -33% -41% -48% -48% - +52% -67% our_g 67.3/s 10% -- -17% -21% -27% -35% -42% -43% - +47% -64% my_e 80.7/s 32% 20% -- -5% -12% -22% -31% -32% - +36% -57% our_u 85.3/s 40% 27% 6% -- -7% -18% -27% -28% - +32% -54% my_g 91.6/s 50% 36% 13% 7% -- -12% -21% -23% - +28% -51% explicit 104/s 70% 54% 28% 22% 13% -- -11% -12% - +18% -44% unpackA 116/s 91% 73% 44% 36% 27% 12% -- -2% +-8% -37% greedy 118/s 94% 76% 47% 39% 29% 14% 2% -- +-6% -36% my_u 126/s 107% 88% 57% 48% 38% 22% 9% 7% + -- -32% substr 186/s 205% 176% 130% 118% 103% 79% 60% 57% +47% --
      Boris

        Thanks for running the benchmarks :)

        That pretty much reflects what I've thought for a while. Despite arguments to the contrary.

        There is a substantial penalty to unicode support for many string operations, even when the strings involved do not, have not and could not contain unicode.

        I wish that it was possible to reliably, manually 'turn off' all unicode processing and conditional testing using a progma (say no utf8; or use bytes;) and recover the 5.6.x performance.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://362141]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (1)
As of 2024-04-24 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found