Re^2: REGEXP: only need last matching string

When I read this, I thought "Don't be ridiculous! Yours is probably the most efficient method suggested!" Then I set about benchmarking the various suggestions to prove it. Alas, yours is only the second most efficient (in terms of time):

               Rate inman blazar2 blazar1 jeanluca thedoe2 prasadbabu drmoron salva thedoe1
inman      107580/s    --     -7%    -28%     -33%    -51%       -54%    -72%  -77%    -78%
blazar2    115313/s    7%      --    -23%     -28%    -48%       -51%    -70%  -75%    -76%
blazar1    149209/s   39%     29%      --      -7%    -32%       -37%    -61%  -68%    -69%
jeanluca   160157/s   49%     39%      7%       --    -27%       -32%    -58%  -65%    -67%
thedoe2    220552/s  105%     91%     48%      38%      --        -6%    -43%  -52%    -54%
prasadbabu 235856/s  119%    105%     58%      47%      7%         --    -39%  -49%    -51%
drmoron    384090/s  257%    233%    157%     140%     74%        63%      --  -16%    -20%
salva      459627/s  327%    299%    208%     187%    108%        95%     20%    --     -5%
thedoe1    481345/s  347%    317%    223%     201%    118%       104%     25%    5%      --

Here's the actual code I used:

#!/usr/bin/perl

use Benchmark qw/cmpthese/;

my $n = shift || -5;

$str = <<HERE;
abc 10
abc 11
abc 12
abc 13
abc 14
HERE

cmpthese($n, {
   jeanluca     => sub { my $dum  = ($str =~/abc\s(\d+)/gs)[-1] ; },
   inman        => sub { () = $str =~ /abc\s(\d+)/g; my $dum = $1; },
   salva        => sub { (my $dum) = $str =~ /^.*abc\s(\d+)/s; },
   prasadbabu   => sub { (my $dum) = $str =~ /abc\s(\d+)$/; },
   blazar1      => sub { (my $dum) = reverse $str =~ /abc\s(\d+)/gs; }
+,
   blazar2      => sub { my $dum = $1 while $str =~ /abc\s(\d+)/gs; },
   drmoron      => sub { (my $dum) = $str =~ /\d+$/gs; },
   thedoe1      => sub { (my $dum) = $str =~ /(?<!abc).*abc\s(\d+)/gs;
+ },
   thedoe2      => sub { (my $dum) = $str =~ /abc(?!.*abc)\s(\d+)/s; }
+,
});
[download]

And when I make $str much bigger (via $str = join "", map { "abc $_\n" } 0..10000;), here's what I see:

               Rate   inman blazar2 blazar1 jeanluca prasadbabu thedoe2 drmoron salva thedoe1
inman        61.8/s      --     -1%     -6%     -32%       -55%    -68%    -84% -100%   -100%
blazar2      62.5/s      1%      --     -5%     -31%       -55%    -67%    -84% -100%   -100%
blazar1      65.7/s      6%      5%      --     -28%       -52%    -66%    -83% -100%   -100%
jeanluca     90.6/s     47%     45%     38%       --       -34%    -53%    -76% -100%   -100%
prasadbabu    137/s    122%    120%    109%      52%         --    -28%    -64% -100%   -100%
thedoe2       192/s    210%    207%    192%     111%        39%      --    -50% -100%   -100%
drmoron       381/s    517%    510%    481%     321%       178%     99%      --  -99%   -100%
salva       43612/s  70503%  69731%  66312%   48027%     31661%  22670%  11339%    --    -91%
thedoe1    496246/s 803274% 794484% 755582%  547517%    361304% 258998% 130059% 1038%      --

Though I'm not quite sure why thedoe1 continues to perform so well. There must be something I'm overlooking. Probably some optimization that perl is doing.

duff

Comment on Re^2: REGEXP: only need last matching string Select or Download Code

Replies are listed 'Best First'.
Re^3: REGEXP: only need last matching string by japhy (Canon) on Dec 24, 2005 at 21:23 UTC
I can see NO reason why thedoe1 should be performing any better than salva's code. `salva => sub { (my $dum) = $str =~ /^.abc\s(\d+)/s; }, thedoe1 => sub { (my $dum) = $str =~ /(?<!abc).abc\s(\d+)/gs; },` [download] Those regexes are equivalent. In fact, I can't understand why in the world thedoe used a look-behind there. It accomplishes nothing, since the first place the regex tries to match is at the beginning of the string. The only difference that it could make is if `pos($str)` is something other than 0, and then that means it would not necessary operate properly (insofaras what was requested from the regex). Sorry to sound grumpy, but this is a misuse of a look-behind (and the /g modifier) that I think should be pointed out. There's no voodoo going on. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re^3: REGEXP: only need last matching string by salva (Canon) on Dec 25, 2005 at 10:35 UTC
well, you are benchmarking a corner case, that's when the last `abc \d+` is at the end of the string. When I said that I didn't know about its performance I was thinking on some not so convenient case, for instance, if the matching substr is near the beginning, in that case perl regexp engine is going to backtrack a lot and maybe looping with the OP regexp and discarding all but the last match could perform better.	[reply] [d/l]