Speed of simple pattern count. A comparison

Hello,

I've just played with various ways to count 1-2 letter patterns in the longer string, and compared speed. If the task is to count the number of exact substring in non-overlapping manner, then there are many ways how to do it. And if to count a single ASCII character, then there are even more ways.
Some ways (e.g. functions/chop and functions/chomp) destruct the target string, so I did the copy of it every time.
Used several perl versions including up to 5.38.2, year 2023.

UPDATE. After jwkrahn comment (Re: Speed of simple pattern count. A comparison) I added variants with functions/index and functions/rindex. Also I added functions/substr variant, which simply takes by one character (I did no included this variant with 2 character long pattern, because it also needs to increase position if I search for non-overlapping matches).

Count a single character variations and speed:

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark 'cmpthese';

my $target = 'abc' x 1e4;

cmpthese(-1,{
    'y' =>         sub { my $m = 0; $m = $target =~ y/a//; },
    '=()=' =>     sub { my $m = 0; $m = () = $target =~ m/a/g; },
    'while' =>     sub { my $m = 0; $m ++ while $target =~ m/a/g; },
    '(?{})' =>     sub { my $m = 0; $target =~ m/a(?{ $m ++ })(*F)/g; 
+},
    'split_grep' => sub { my $m = 0; 
        $m = grep $_ eq 'a', split '', $target; 
        },
    'split_by' => sub { my $m = 0; 
        $m = -1 + split 'a', 'x' . $target . 'x'; 
        },
    'chop' =>     sub { my $m = 0; 
        my $target2 = $target;
        $m += ( 'a' eq chop $target2 ) while length $target2; 
        },
    'chomp' =>     sub { my $m = 0; 
        my $target2 = $target;
        local $/ = 'a';
        0 while chomp $target2 and ++ $m or chop $target2;
        },
    'index' =>     sub { my $m = 0; 
        my $pat_len = length 'a';
        my $pos = -$pat_len;
        $m ++ while -1 < ( $pos = index $target, 'a', $pos + $pat_len 
+);
        },
    'rindex' =>     sub { my $m = 0; 
        my $pat_len = length 'a';
        my $pos = -1 + length $target;
        $m ++ while -1 < ( $pos = rindex $target, 'a', $pos - $pat_len
+ );
        },
    'substr' =>     sub { my $m = 0; 
        $m += ( 'a' eq substr $target, $_, 1 ) for 0 .. -2 + length $t
+arget;
        },
});
[download]

OUTPUT:

perl-5.38.2
==========
              Rate split_grep  (?{}) substr   chop chomp  =()= while i
+ndex rindex split_by tr / y
split_grep   214/s         --    -4%   -23%   -36%  -47%  -68%  -70%  
+-77%   -79%     -99%   -99%
(?{})        222/s         4%     --   -20%   -34%  -45%  -67%  -69%  
+-77%   -78%     -98%   -99%
substr       279/s        30%    26%     --   -17%  -31%  -59%  -61%  
+-71%   -73%     -98%   -99%
chop         336/s        57%    51%    20%     --  -16%  -50%  -53%  
+-65%   -67%     -98%   -99%
chomp        403/s        88%    81%    44%    20%    --  -40%  -44%  
+-58%   -61%     -97%   -99%
=()=         673/s       214%   203%   141%   100%   67%    --   -6%  
+-29%   -34%     -95%   -98%
while        718/s       235%   223%   157%   114%   78%    7%    --  
+-24%   -30%     -95%   -98%
index        948/s       342%   326%   240%   182%  135%   41%   32%  
+  --    -8%     -93%   -97%
rindex      1028/s       380%   362%   268%   206%  155%   53%   43%  
+  8%     --     -93%   -97%
split_by   14354/s      6599%  6359%  5045%  4170% 3466% 2032% 1899% 1
+415%  1297%       --   -59%
tr / y     34600/s     16047% 15470% 12301% 10192% 8496% 5039% 4718% 3
+551%  3267%     141%     --

perl-5.32.0
==========
              Rate split_grep  (?{}) substr  chomp  chop  =()= while r
+index index split_by tr / y
split_grep   163/s         --   -16%   -45%   -46%  -53%  -68%  -78%  
+ -83%  -83%     -99%  -100%
(?{})        194/s        19%     --   -34%   -36%  -44%  -62%  -73%  
+ -79%  -79%     -99%   -99%
substr       296/s        82%    52%     --    -2%  -14%  -42%  -59%  
+ -68%  -68%     -98%   -99%
chomp        302/s        85%    55%     2%     --  -13%  -41%  -59%  
+ -68%  -68%     -98%   -99%
chop         345/s       112%    78%    17%    14%    --  -32%  -53%  
+ -63%  -63%     -98%   -99%
=()=         511/s       213%   163%    72%    69%   48%    --  -30%  
+ -46%  -46%     -97%   -99%
while        731/s       348%   276%   147%   142%  112%   43%    --  
+ -22%  -22%     -96%   -98%
rindex       939/s       476%   383%   217%   211%  172%   84%   28%  
+   --   -0%     -95%   -97%
index        939/s       476%   383%   217%   211%  172%   84%   28%  
+   0%    --     -95%   -97%
split_by   18553/s     11271%  9442%  6162%  6048% 5272% 3532% 2436%  
+1876% 1875%       --   -46%
tr / y     34462/s     21022% 17623% 11531% 11319% 9878% 6645% 4611%  
+3570% 3568%      86%     --

perl-5.20.1
==========
              Rate split_grep substr   chop   =()=  chomp (?{}) while 
+index rindex split_by tr / y
split_grep   181/s         --   -25%   -38%   -41%   -41%  -54%  -73% 
+ -74%   -74%     -99%   -99%
substr       241/s        33%     --   -17%   -21%   -21%  -38%  -64% 
+ -65%   -66%     -99%   -99%
chop         290/s        61%    20%     --    -5%    -5%  -26%  -57% 
+ -58%   -59%     -98%   -99%
=()=         304/s        68%    26%     5%     --    -1%  -22%  -55% 
+ -56%   -57%     -98%   -99%
chomp        307/s        70%    27%     6%     1%     --  -22%  -54% 
+ -56%   -56%     -98%   -99%
(?{})        392/s       117%    62%    35%    29%    28%    --  -42% 
+ -44%   -44%     -98%   -99%
while        673/s       273%   179%   132%   121%   120%   72%    -- 
+  -3%    -4%     -96%   -98%
index        697/s       286%   189%   140%   129%   127%   78%    4% 
+   --    -1%     -96%   -98%
rindex       704/s       289%   192%   142%   131%   129%   80%    5% 
+   1%     --     -96%   -98%
split_by   16747/s      9166%  6845%  5665%  5403%  5360% 4177% 2387% 
+2302%  2280%       --   -51%
tr / y     34296/s     18876% 14124% 11707% 11170% 11081% 8658% 4994% 
+4819%  4773%     105%     --
[download]

Transliteration (perlop#tr y / tr) is absolutely fastest.
Splitting by search value is second fastest. Although it counts as inverse, i.e. the number of not matched chunks. Here is important to note edge cases: if the pattern matches right at the beginning and/or right on the end, therefore I add 'x' at both sides, which should not be a substring of the pattern.
Other ways are way slower.
If look across versions, I can spot that (?{})(perlre#(?{-code-})) became about 2x slower between 5.20 and 5.32. The results of perl 5.14 (not shown here) are similar to 5.20, except that I need to use 'our $m' variable with (?{}) variant.

And here are variations (less than searching for single-char) with pattern of two characters:

cmpthese(-1,{
    '=()=' =>     sub { my $m = 0; $m = () = $target =~ m/ab/g; },
    'while' =>     sub { my $m = 0; $m ++ while $target =~ m/ab/g; },
    '(?{})' =>     sub { my $m = 0; $target =~ m/ab(?{ $m ++ })(*F)/g;
+ },
    'split_by' => sub { my $m = 0; 
        $m = -1 + split 'ab', 'x' . $target . 'x'; 
        },
    'chomp' =>     sub { my $m = 0; 
        my $target2 = $target;
        local $/ = 'ab';
        0 while chomp $target2 and ++ $m or chop $target2;
        },
    'index' =>     sub { my $m = 0; 
        my $pat_len = length 'ab';
        my $pos = -$pat_len;
        $m ++ while -1 < ( $pos = index $target, 'ab', $pos + $pat_len
+ );
        },
    'rindex' =>     sub { my $m = 0; 
        my $pat_len = length 'ab';
        my $pos = -1 + length $target;
        $m ++ while -1 < ( $pos = rindex $target, 'ab', $pos - $pat_le
+n );
        },
});
[download]

OUTPUT:


perl-5.38.2
==========
           Rate    (?{})    chomp     =()=    while   rindex    index 
+split_by
(?{})     216/s       --     -64%     -70%     -75%     -78%     -81% 
+    -96%
chomp     599/s     177%       --     -17%     -31%     -40%     -47% 
+    -90%
=()=      718/s     232%      20%       --     -18%     -28%     -36% 
+    -88%
while     872/s     303%      46%      21%       --     -13%     -22% 
+    -85%
rindex    999/s     362%      67%      39%      15%       --     -11% 
+    -83%
index    1120/s     418%      87%      56%      28%      12%       -- 
+    -81%
split_by 5749/s    2559%     860%     701%     559%     475%     413% 
+      --

perl-5.32.0
==========
           Rate    (?{})    chomp     =()=    while    index   rindex 
+split_by
(?{})     228/s       --     -59%     -64%     -75%     -77%     -78% 
+    -95%
chomp     555/s     143%       --     -12%     -39%     -44%     -47% 
+    -89%
=()=      627/s     175%      13%       --     -31%     -37%     -41% 
+    -88%
while     913/s     300%      65%      46%       --      -9%     -13% 
+    -82%
index     999/s     338%      80%      59%       9%       --      -5% 
+    -80%
rindex   1056/s     362%      90%      68%      16%       6%       -- 
+    -79%
split_by 5046/s    2110%     810%     705%     452%     405%     378% 
+      --

perl-5.20.1
==========
           Rate     =()=    (?{})    chomp   rindex    index    while 
+split_by
=()=      372/s       --     -19%     -22%     -54%     -54%     -56% 
+    -96%
(?{})     462/s      24%       --      -3%     -43%     -43%     -45% 
+    -95%
chomp     478/s      29%       4%       --     -41%     -41%     -43% 
+    -95%
rindex    807/s     117%      75%      69%       --      -1%      -3% 
+    -92%
index     814/s     119%      76%      70%       1%       --      -3% 
+    -92%
while     836/s     125%      81%      75%       4%       3%       -- 
+    -91%
split_by 9752/s    2524%    2013%    1941%    1108%    1099%    1066% 
+      --
[download]

Split by search pattern is way faster than other variants. But to use it I need to think about edge cases and possible overlapping after appending or prepending additional symbols. Update. However splitting by search pattern seems to become almost 2x slower somewhere between perl-5.20 and 5.32.

Update-2. Using while chop and while chomp it is important to note, that condition will terminate if pattern or chopped character was 0, therefore to overcome this limitation I should have added length, e.g. while length cho(m)p.

Comment on Speed of simple pattern count. A comparison Select or Download Code

Replies are listed 'Best First'.
Re: Speed of simple pattern count. A comparison by jwkrahn (Abbot) on Jan 07, 2024 at 05:05 UTC
I didn't see `index` or `rindex` anywhere in your code (another way to find and/or count strings inside other strings.) Also, it seems to me that `'x'` should be prepended and appended to all the strings. Naked blocks are fun! -- Randal L. Schwartz, Perl hacker	[reply]
Re^2: Speed of simple pattern count. A comparison by rsFalse (Chaplain) on Jan 07, 2024 at 13:53 UTC
Also, it seems to me that 'x' should be prepended and appended to all the strings. I prepended and appended it only to "split_by" variation, it should guarantee that the amount of returned chunks is +1 of amount of separators (split pattern occurrences). I think it is not relevant for other variations.	[reply]
Re^3: Speed of simple pattern count. A comparison by jwkrahn (Abbot) on Jan 07, 2024 at 21:16 UTC
Most of your tests use a 30,000 character string but your "split_by" test uses a 30,002 character string so the tests are not exactly equivalent. Just saying ... `:)` Naked blocks are fun! -- Randal L. Schwartz, Perl hacker	[reply]
Re^4: Speed of simple pattern count. A comparison by rsFalse (Chaplain) on Jan 07, 2024 at 21:59 UTC