in reply to Re: string pattern match, limited to first 1000 characters?
in thread string pattern match, limited to first 1000 characters?

What do you make of this benchmark? :)

#!/usr/bin/perl use Benchmark qw(cmpthese); $substr = join('','a'..'j'); $str = $substr x 90 . 'hTmL'. $substr x 2000; print "length of search string: ",length $str, "\n"; cmpthese(-3, { ' regex ' => sub { $str =~ /^\A.{0,996}?html/si or $str =~ /^\A.{0,996}?sgml/si; }, ' substr' => sub { substr( $str, 0, 1000) =~ /html/i or substr( $str, 0, 1000) =~ /sgml/i; }, 'index' => sub { 1+index( lc substr( $str, 0, 1000 ), 'html' ) or 1+index( lc substr( $str, 0, 1000 ), 'sgml' ) } , }); print substr $str, 900, 10;; __END__ C:\test>junk length of search string: 20904 Rate regex substr index regex 93505/s -- -54% -65% substr 204367/s 119% -- -23% index 266809/s 185% 31% -- hTmLabcdef

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^3: string pattern match, limited to first 1000 characters?
by shmem (Chancellor) on Jun 23, 2007 at 10:27 UTC
    The conlusions are obvious, aren't they?
    • don't use a regexp when all you need is index
    • don't use a regexp with a quantifier pattern if you can substr
    • simple tools are fastest for simple tasks

    Did I miss some?

    Nice that you combined the positive and negative searches into one, so one can see the average of both.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re^3: string pattern match, limited to first 1000 characters?
by roboticus (Chancellor) on Jun 23, 2007 at 12:34 UTC

    BrowserUk:

    Ummm ... you want to reverse those or clauses, otherwise the second clause in your or won't run.

    #!/usr/bin/perl use Benchmark qw(cmpthese); $substr = join('','a'..'j'); $str = $substr x 90 . 'hTmL'. $substr x 2000; print "length of search string: ",length $str, "\n"; cmpthese(-3, { 'Fregex ' => sub { $str =~ /^\A.{0,996}?html/si or $str =~ /^\A.{0,996}?sgml/si; }, 'Fsubstr' => sub { substr( $str, 0, 1000) =~ /sgml/i or substr( $str, 0, 1000) =~ /html/i; }, 'Findex ' => sub { 1+index( lc substr( $str, 0, 1000 ), 'sgml' ) or 1+index( lc substr( $str, 0, 1000 ), 'html' ) }, 'Rregex ' => sub { $str =~ /^\A.{0,996}?sgml/si or $str =~ /^\A.{0,996}?html/si; }, 'Rsubstr' => sub { substr( $str, 0, 1000) =~ /sgml/i or substr( $str, 0, 1000) =~ /html/i; }, 'Rindex' => sub { 1+index( lc substr( $str, 0, 1000 ), 'sgml' ) or 1+index( lc substr( $str, 0, 1000 ), 'html' ) }, }); print substr $str, 900, 10; __END__ root@swill ~/PerlMonks$ ./string_search2.pl length of search string: 20904 Rate Rregex Fregex Rsubstr Fsubstr Findex Rindex Rregex 48562/s -- -35% -54% -54% -55% -55% Fregex 75225/s 55% -- -28% -29% -30% -30% Rsubstr 104700/s 116% 39% -- -1% -2% -3% Fsubstr 105896/s 118% 41% 1% -- -1% -1% Findex 107056/s 120% 42% 2% 1% -- -0% Rindex 107434/s 121% 43% 3% 1% 0% -- hTmLabcdef root@swill ~/PerlMonks$

    Update: Had I used my brain, I'd've changed the $str definition to use 'sGmL' rather than edit the function definitions....

    ...roboticus

    There are lies, damned lies, and benchmarks.

      It looks from your code that, apart from Fregex and Rregex, your subroutine pairs are identical as you haven't swapped the 'sgml' and 'html' around. When I run your code with them swapped around I get these results. (Rates are slow as the machine is a quite elderly SPARC.)

      length of search string: 20904 Rate Rregex Fregex Rsubstr Rindex Fsubstr Findex Rregex 11708/s -- -16% -41% -45% -61% -70% Fregex 14005/s 20% -- -29% -35% -54% -64% Rsubstr 19721/s 68% 41% -- -8% -35% -50% Rindex 21423/s 83% 53% 9% -- -29% -45% Fsubstr 30362/s 159% 117% 54% 42% -- -22% Findex 39142/s 234% 179% 98% 83% 29% --

      Cheers,

      JohnGG

      Update: Fixed typo.

        johngg:

        D'oh!++ and I thought I had. I wonder what I was thinking? I *know* I changed the lines, but I guess I changed 'sgml' to 'sgml' and 'html' to 'html'... (Or did I change it twice during my confusion? I guess we'll never know.)

        Thanks for catching that!

        ...roboticus