in reply to Re^2: string pattern match, limited to first 1000 characters?
in thread string pattern match, limited to first 1000 characters?

BrowserUk:

Ummm ... you want to reverse those or clauses, otherwise the second clause in your or won't run.

#!/usr/bin/perl use Benchmark qw(cmpthese); $substr = join('','a'..'j'); $str = $substr x 90 . 'hTmL'. $substr x 2000; print "length of search string: ",length $str, "\n"; cmpthese(-3, { 'Fregex ' => sub { $str =~ /^\A.{0,996}?html/si or $str =~ /^\A.{0,996}?sgml/si; }, 'Fsubstr' => sub { substr( $str, 0, 1000) =~ /sgml/i or substr( $str, 0, 1000) =~ /html/i; }, 'Findex ' => sub { 1+index( lc substr( $str, 0, 1000 ), 'sgml' ) or 1+index( lc substr( $str, 0, 1000 ), 'html' ) }, 'Rregex ' => sub { $str =~ /^\A.{0,996}?sgml/si or $str =~ /^\A.{0,996}?html/si; }, 'Rsubstr' => sub { substr( $str, 0, 1000) =~ /sgml/i or substr( $str, 0, 1000) =~ /html/i; }, 'Rindex' => sub { 1+index( lc substr( $str, 0, 1000 ), 'sgml' ) or 1+index( lc substr( $str, 0, 1000 ), 'html' ) }, }); print substr $str, 900, 10; __END__ root@swill ~/PerlMonks$ ./string_search2.pl length of search string: 20904 Rate Rregex Fregex Rsubstr Fsubstr Findex Rindex Rregex 48562/s -- -35% -54% -54% -55% -55% Fregex 75225/s 55% -- -28% -29% -30% -30% Rsubstr 104700/s 116% 39% -- -1% -2% -3% Fsubstr 105896/s 118% 41% 1% -- -1% -1% Findex 107056/s 120% 42% 2% 1% -- -0% Rindex 107434/s 121% 43% 3% 1% 0% -- hTmLabcdef root@swill ~/PerlMonks$

Update: Had I used my brain, I'd've changed the $str definition to use 'sGmL' rather than edit the function definitions....

...roboticus

There are lies, damned lies, and benchmarks.

Replies are listed 'Best First'.
Re^4: string pattern match, limited to first 1000 characters?
by johngg (Canon) on Jun 23, 2007 at 17:40 UTC
    It looks from your code that, apart from Fregex and Rregex, your subroutine pairs are identical as you haven't swapped the 'sgml' and 'html' around. When I run your code with them swapped around I get these results. (Rates are slow as the machine is a quite elderly SPARC.)

    length of search string: 20904 Rate Rregex Fregex Rsubstr Rindex Fsubstr Findex Rregex 11708/s -- -16% -41% -45% -61% -70% Fregex 14005/s 20% -- -29% -35% -54% -64% Rsubstr 19721/s 68% 41% -- -8% -35% -50% Rindex 21423/s 83% 53% 9% -- -29% -45% Fsubstr 30362/s 159% 117% 54% 42% -- -22% Findex 39142/s 234% 179% 98% 83% 29% --

    Cheers,

    JohnGG

    Update: Fixed typo.

      johngg:

      D'oh!++ and I thought I had. I wonder what I was thinking? I *know* I changed the lines, but I guess I changed 'sgml' to 'sgml' and 'html' to 'html'... (Or did I change it twice during my confusion? I guess we'll never know.)

      Thanks for catching that!

      ...roboticus