Curiouser and curiouser. The change of tests, changes the relative performance of the methods, but it also slows all of them down.

I thought that using globals instead of lexicals might have been part of the difference, and it is, but only a small part.

#! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $TEST ||= 0; our $N = $TEST ? 10 : $N || 1000; our @data = map{ join' ', '2004-05-13', '14:02:00', ('blah') x (1+rand +( 9 )) } 1 .. $N; our (@greedy, @explicit, @unpack); cmpthese( $TEST ? 1 : -1, { our_g => '@greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @data' +, our_e => '@explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.*$)/x} @data +', our_u => '@unpack = map {unpack "A10 x A8 x A*" => $_} @data' +, my_g => 'my @greedy = map {/(^\S*)\s(\S*)\s(.*$)/} @da +ta', my_e => 'my @explicit = map {/(^\d{4}\-\d{2}\-\d{2})\s (\d{2}:\d{2}:\d{2})\s(.*$)/x} @data +', my_u => 'my @unpack = map {unpack "A10 x A8 x A*" => $_} @da +ta', greedy => q[ my( $date, $time, $text ); m[(^\S*)\s(\S*)\s(.*$)] and ( $date, $time, $text ) = ( $1, $2, $3 ) # and $TEST and print "greedy: $date|$time|$text" for @data; ], explicit => q[ my( $date, $time, $text ); m[(^\d{4}\-\d{2}\-\d{2})\s(\d{2}:\d{2}:\d{2})\s(.*$)] and ( $date, $time, $text ) = ( $1, $2, $3 ) # and $TEST and print "explicit: $date|$time|$text" for @data; ], unpackA => q[ use bytes; my( $date, $time, $text ); ( $date, $time, $text ) = unpack 'A10 x A8 x A*', $_ # and $TEST and print "unpackA: $date|$time|$text" for @data; ], substr => q[ use bytes; my( $date, $time, $text ); ( $date, $time, $text ) = ( substr( $_, 0, 10 ), substr( $_, 11, 8 ), substr( $_, 20 ) ) # and $TEST and print "substr: $date|$time|$text" for @data; ], }); __END__ P:\test>362106 Rate our_e our_g our_u my_e my_g my_u unpackA substr expli +cit greedy our_e 72.4/s -- -2% -15% -28% -30% -43% -55% -73% - +77% -79% our_g 73.6/s 2% -- -13% -27% -29% -42% -54% -73% - +77% -79% our_u 85.0/s 17% 15% -- -16% -18% -33% -47% -69% - +73% -75% my_e 101/s 39% 37% 19% -- -3% -20% -37% -63% - +68% -71% my_g 104/s 43% 41% 22% 3% -- -18% -35% -62% - +67% -70% my_u 126/s 74% 71% 48% 25% 21% -- -21% -53% - +60% -64% unpackA 160/s 121% 117% 88% 59% 54% 27% -- -41% - +49% -54% substr 270/s 273% 267% 218% 168% 160% 114% 69% -- - +14% -22% explicit 314/s 334% 327% 270% 212% 203% 149% 96% 16% + -- -9% greedy 346/s 378% 370% 307% 243% 234% 175% 116% 28% +10% --

It would be interesting to see the benchmark run on 5.6.2 (pre-unicodification), which I don't have installed currently.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

In reply to Re^2: fast greedy regex by BrowserUk
in thread fast greedy regex by js1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.