in reply to Getting the number of times a regexp matches
Re-Updated with t0mas' golfing and the quidity/dchetlin golf
#!/usr/bin/perl use Benchmark qw(cmpthese); use vars qw($g $c $p); $p = "b"; #$p = "[ba]"; #$p = "c"; $g = "abababababababababababababababababababababababababababababa"; cmpthese (-10, { 'map' => '$c = scalar map {1} ($g =~ m/$p/g);', 'array' => '$c = scalar @{[$g =~ m/$p/g]};', 's//' => '$c = ($g =~ s/($p)/$1/g);', 'while' => '$c++ while ($g =~ m/$p/g);', 'split' => '$c= (scalar split /$p/,$g) +($g=~/$p$/)-1;', '@_' => '@_=($g =~ m/$p/g) and $c=1+$#_;', '()' => '$c=()=$g=~/$p/g;', });
Rate s// @_ map array () while split s// 1028/s -- -38% -48% -48% -54% -68% -77% @_ 1653/s 61% -- -16% -17% -26% -49% -63% map 1971/s 92% 19% -- -1% -12% -39% -56% array 1995/s 94% 21% 1% -- -11% -38% -56% () 2240/s 118% 35% 14% 12% -- -31% -50% while 3243/s 215% 96% 65% 63% 45% -- -28% split 4486/s 336% 171% 128% 125% 100% 38% --
Ok, the split solution kicks deprecated errors with -w and abusing it is crappy anyway, BUT oh mama is it fast on my machine. Personally, I'd recommend the while solution as safe and clean. It's a shame I can't ++ mirod twice!
Oh yeah, if the match fails (match "c") the results are a little different:
Rate split array s// while map @_ () split 14203/s -- -43% -63% -71% -72% -74% -76% array 24826/s 75% -- -36% -48% -52% -55% -59% s// 38899/s 174% 57% -- -19% -24% -30% -35% while 48184/s 239% 94% 24% -- -6% -13% -20% map 51481/s 262% 107% 32% 7% -- -7% -15% @_ 55225/s 289% 122% 42% 15% 7% -- -8% () 60282/s 324% 143% 55% 25% 17% 9% --
while is warning safe, fast and has a cheap setup in mismatch cases. Plus, the more complex the match, the worse split will get:
Matching against [ab] for example:
Rate s// @_ map array () split while s// 657/s -- -10% -22% -25% -27% -43% -47% @_ 729/s 11% -- -14% -17% -19% -36% -42% map 846/s 29% 16% -- -4% -6% -26% -32% array 877/s 33% 20% 4% -- -2% -24% -30% () 898/s 37% 23% 6% 2% -- -22% -28% split 1148/s 75% 57% 36% 31% 28% -- -8% while 1249/s 90% 71% 48% 42% 39% 9% --
--
Updated. I still think the cleanest of the bunch is the while variation and it is surely showing its colors in ranking up near the top in all the variations. The ()s and array hacks stay right in there tho and both are clear and/or simple as well. As a final test, I passed the match 'c|\d+|ab' against my /var/log/lastlog (300KB) and this is what I got:
s/iter s// while split () array map @_ s// 1.14 -- -4% -4% -5% -5% -5% -6% while 1.10 4% -- -0% -1% -1% -2% -3% split 1.10 4% 0% -- -1% -1% -1% -3% () 1.09 5% 1% 1% -- -0% -0% -2% array 1.09 5% 1% 1% 0% -- -0% -2% map 1.08 5% 2% 1% 0% 0% -- -1% @_ 1.07 7% 3% 3% 2% 2% 1% --
*snort* I'll le tyou draw your own conclusions.
$you = new YOU;
honk() if $you->love(perl)
p.s. this post, my 321st, made me a bishop =)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
(TMTOWTDI) Re (2): Getting the number of times a regexp matches
by mwp (Hermit) on Dec 07, 2000 at 17:23 UTC |