in reply to Getting the number of times a regexp matches

Benchmark time! Whoohoo! With warnings and strict off and a brain dead match.

Re-Updated with t0mas' golfing and the quidity/dchetlin golf

#!/usr/bin/perl use Benchmark qw(cmpthese); use vars qw($g $c $p); $p = "b"; #$p = "[ba]"; #$p = "c"; $g = "abababababababababababababababababababababababababababababa"; cmpthese (-10, { 'map' => '$c = scalar map {1} ($g =~ m/$p/g);', 'array' => '$c = scalar @{[$g =~ m/$p/g]};', 's//' => '$c = ($g =~ s/($p)/$1/g);', 'while' => '$c++ while ($g =~ m/$p/g);', 'split' => '$c= (scalar split /$p/,$g) +($g=~/$p$/)-1;', '@_' => '@_=($g =~ m/$p/g) and $c=1+$#_;', '()' => '$c=()=$g=~/$p/g;', });
        Rate   s//    @_   map array    () while split
s//   1028/s    --  -38%  -48%  -48%  -54%  -68%  -77%
@_    1653/s   61%    --  -16%  -17%  -26%  -49%  -63%
map   1971/s   92%   19%    --   -1%  -12%  -39%  -56%
array 1995/s   94%   21%    1%    --  -11%  -38%  -56%
()    2240/s  118%   35%   14%   12%    --  -31%  -50%
while 3243/s  215%   96%   65%   63%   45%    --  -28%
split 4486/s  336%  171%  128%  125%  100%   38%    --

Ok, the split solution kicks deprecated errors with -w and abusing it is crappy anyway, BUT oh mama is it fast on my machine. Personally, I'd recommend the while solution as safe and clean. It's a shame I can't ++ mirod twice!

Oh yeah, if the match fails (match "c") the results are a little different:

         Rate split array   s// while   map    @_    ()
split 14203/s    --  -43%  -63%  -71%  -72%  -74%  -76%
array 24826/s   75%    --  -36%  -48%  -52%  -55%  -59%
s//   38899/s  174%   57%    --  -19%  -24%  -30%  -35%
while 48184/s  239%   94%   24%    --   -6%  -13%  -20%
map   51481/s  262%  107%   32%    7%    --   -7%  -15%
@_    55225/s  289%  122%   42%   15%    7%    --   -8%
()    60282/s  324%  143%   55%   25%   17%    9%    --

while is warning safe, fast and has a cheap setup in mismatch cases. Plus, the more complex the match, the worse split will get:

Matching against [ab] for example:

        Rate   s//    @_   map array    () split while
s//    657/s    --  -10%  -22%  -25%  -27%  -43%  -47%
@_     729/s   11%    --  -14%  -17%  -19%  -36%  -42%
map    846/s   29%   16%    --   -4%   -6%  -26%  -32%
array  877/s   33%   20%    4%    --   -2%  -24%  -30%
()     898/s   37%   23%    6%    2%    --  -22%  -28%
split 1148/s   75%   57%   36%   31%   28%    --   -8%
while 1249/s   90%   71%   48%   42%   39%    9%    --

--

Updated. I still think the cleanest of the bunch is the while variation and it is surely showing its colors in ranking up near the top in all the variations. The ()s and array hacks stay right in there tho and both are clear and/or simple as well. As a final test, I passed the match 'c|\d+|ab' against my /var/log/lastlog (300KB) and this is what I got:

      s/iter   s// while split    () array   map    @_
s//     1.14    --   -4%   -4%   -5%   -5%   -5%   -6%
while   1.10    4%    --   -0%   -1%   -1%   -2%   -3%
split   1.10    4%    0%    --   -1%   -1%   -1%   -3%
()      1.09    5%    1%    1%    --   -0%   -0%   -2%
array   1.09    5%    1%    1%    0%    --   -0%   -2%
map     1.08    5%    2%    1%    0%    0%    --   -1%
@_      1.07    7%    3%    3%    2%    2%    1%    --

*snort* I'll le tyou draw your own conclusions. $you = new YOU;
honk() if $you->love(perl)

p.s. this post, my 321st, made me a bishop =)

Replies are listed 'Best First'.
(TMTOWTDI) Re (2): Getting the number of times a regexp matches
by mwp (Hermit) on Dec 07, 2000 at 17:23 UTC
    Darn. I was proud of the @{[]} trick I hacked together for this problem, too bad it scored so poorly. Ah well, thanks for the benchmarks extremely, and congrats on the promotion. {g}

    This kind of reminds me of a show on A&E I caught a few minutes of the other day. It had Jeremy Irons in it and he was trying to rebuild an old clock, either from an old schematic or model, I'm not sure which. At any rate, it was one of the first shipboard clocks, one to counteract the effect the swaying deck had on the pendulum. At one point, he becomes irate, saying "...it's a terrible mess, layer and layer of complexity, one piece correcting for the last. The man absolutely refused to admit he was wrong and come up with other concepts." Or something to that effect. =)

    I just thought that fit nicely in with this. Presented with a problem and current behavior (m//g returns a list of matched values in list context) I used the ol' hammer-and-nail routine. It seemed to work well enough and made absolute sense to me. But some other folks went back to the root of the problem and came up with completely different solutions that worked from an oblique angle. Look at mirod's solution, for example, something I would have never even thought of. Amazing.

    The nature of Perl, I suppose...

    'kaboo