in reply to Re: REGEXP: only need last matching string
in thread REGEXP: only need last matching string

When I read this, I thought "Don't be ridiculous! Yours is probably the most efficient method suggested!" Then I set about benchmarking the various suggestions to prove it. Alas, yours is only the second most efficient (in terms of time):

               Rate inman blazar2 blazar1 jeanluca thedoe2 prasadbabu drmoron salva thedoe1
inman      107580/s    --     -7%    -28%     -33%    -51%       -54%    -72%  -77%    -78%
blazar2    115313/s    7%      --    -23%     -28%    -48%       -51%    -70%  -75%    -76%
blazar1    149209/s   39%     29%      --      -7%    -32%       -37%    -61%  -68%    -69%
jeanluca   160157/s   49%     39%      7%       --    -27%       -32%    -58%  -65%    -67%
thedoe2    220552/s  105%     91%     48%      38%      --        -6%    -43%  -52%    -54%
prasadbabu 235856/s  119%    105%     58%      47%      7%         --    -39%  -49%    -51%
drmoron    384090/s  257%    233%    157%     140%     74%        63%      --  -16%    -20%
salva      459627/s  327%    299%    208%     187%    108%        95%     20%    --     -5%
thedoe1    481345/s  347%    317%    223%     201%    118%       104%     25%    5%      --
Here's the actual code I used:
#!/usr/bin/perl use Benchmark qw/cmpthese/; my $n = shift || -5; $str = <<HERE; abc 10 abc 11 abc 12 abc 13 abc 14 HERE cmpthese($n, { jeanluca => sub { my $dum = ($str =~/abc\s(\d+)/gs)[-1] ; }, inman => sub { () = $str =~ /abc\s(\d+)/g; my $dum = $1; }, salva => sub { (my $dum) = $str =~ /^.*abc\s(\d+)/s; }, prasadbabu => sub { (my $dum) = $str =~ /abc\s(\d+)$/; }, blazar1 => sub { (my $dum) = reverse $str =~ /abc\s(\d+)/gs; } +, blazar2 => sub { my $dum = $1 while $str =~ /abc\s(\d+)/gs; }, drmoron => sub { (my $dum) = $str =~ /\d+$/gs; }, thedoe1 => sub { (my $dum) = $str =~ /(?<!abc).*abc\s(\d+)/gs; + }, thedoe2 => sub { (my $dum) = $str =~ /abc(?!.*abc)\s(\d+)/s; } +, });
And when I make $str much bigger (via $str = join "", map { "abc $_\n" } 0..10000;), here's what I see:
               Rate   inman blazar2 blazar1 jeanluca prasadbabu thedoe2 drmoron salva thedoe1
inman        61.8/s      --     -1%     -6%     -32%       -55%    -68%    -84% -100%   -100%
blazar2      62.5/s      1%      --     -5%     -31%       -55%    -67%    -84% -100%   -100%
blazar1      65.7/s      6%      5%      --     -28%       -52%    -66%    -83% -100%   -100%
jeanluca     90.6/s     47%     45%     38%       --       -34%    -53%    -76% -100%   -100%
prasadbabu    137/s    122%    120%    109%      52%         --    -28%    -64% -100%   -100%
thedoe2       192/s    210%    207%    192%     111%        39%      --    -50% -100%   -100%
drmoron       381/s    517%    510%    481%     321%       178%     99%      --  -99%   -100%
salva       43612/s  70503%  69731%  66312%   48027%     31661%  22670%  11339%    --    -91%
thedoe1    496246/s 803274% 794484% 755582%  547517%    361304% 258998% 130059% 1038%      --

Though I'm not quite sure why thedoe1 continues to perform so well. There must be something I'm overlooking. Probably some optimization that perl is doing.

Replies are listed 'Best First'.
Re^3: REGEXP: only need last matching string
by japhy (Canon) on Dec 24, 2005 at 21:23 UTC
    I can see NO reason why thedoe1 should be performing any better than salva's code.
    salva => sub { (my $dum) = $str =~ /^.*abc\s(\d+)/s; }, thedoe1 => sub { (my $dum) = $str =~ /(?<!abc).*abc\s(\d+)/gs; },
    Those regexes are equivalent. In fact, I can't understand why in the world thedoe used a look-behind there. It accomplishes nothing, since the first place the regex tries to match is at the beginning of the string. The only difference that it could make is if pos($str) is something other than 0, and then that means it would not necessary operate properly (insofaras what was requested from the regex). Sorry to sound grumpy, but this is a misuse of a look-behind (and the /g modifier) that I think should be pointed out. There's no voodoo going on.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re^3: REGEXP: only need last matching string
by salva (Canon) on Dec 25, 2005 at 10:35 UTC
    well, you are benchmarking a corner case, that's when the last abc \d+ is at the end of the string.

    When I said that I didn't know about its performance I was thinking on some not so convenient case, for instance, if the matching substr is near the beginning, in that case perl regexp engine is going to backtrack a lot and maybe looping with the OP regexp and discarding all but the last match could perform better.