Hello,

I've just played with various ways to count 1-2 letter patterns in the longer string, and compared speed. If the task is to count the number of exact substring in non-overlapping manner, then there are many ways how to do it. And if to count a single ASCII character, then there are even more ways.
Some ways (e.g. functions/chop and functions/chomp) destruct the target string, so I did the copy of it every time.
Used several perl versions including up to 5.38.2, year 2023.

UPDATE. After jwkrahn comment (Re: Speed of simple pattern count. A comparison) I added variants with functions/index and functions/rindex. Also I added functions/substr variant, which simply takes by one character (I did no included this variant with 2 character long pattern, because it also needs to increase position if I search for non-overlapping matches).

Count a single character variations and speed:
#!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; my $target = 'abc' x 1e4; cmpthese(-1,{ 'y' => sub { my $m = 0; $m = $target =~ y/a//; }, '=()=' => sub { my $m = 0; $m = () = $target =~ m/a/g; }, 'while' => sub { my $m = 0; $m ++ while $target =~ m/a/g; }, '(?{})' => sub { my $m = 0; $target =~ m/a(?{ $m ++ })(*F)/g; +}, 'split_grep' => sub { my $m = 0; $m = grep $_ eq 'a', split '', $target; }, 'split_by' => sub { my $m = 0; $m = -1 + split 'a', 'x' . $target . 'x'; }, 'chop' => sub { my $m = 0; my $target2 = $target; $m += ( 'a' eq chop $target2 ) while length $target2; }, 'chomp' => sub { my $m = 0; my $target2 = $target; local $/ = 'a'; 0 while chomp $target2 and ++ $m or chop $target2; }, 'index' => sub { my $m = 0; my $pat_len = length 'a'; my $pos = -$pat_len; $m ++ while -1 < ( $pos = index $target, 'a', $pos + $pat_len +); }, 'rindex' => sub { my $m = 0; my $pat_len = length 'a'; my $pos = -1 + length $target; $m ++ while -1 < ( $pos = rindex $target, 'a', $pos - $pat_len + ); }, 'substr' => sub { my $m = 0; $m += ( 'a' eq substr $target, $_, 1 ) for 0 .. -2 + length $t +arget; }, });
OUTPUT:
perl-5.38.2 ========== Rate split_grep (?{}) substr chop chomp =()= while i +ndex rindex split_by tr / y split_grep 214/s -- -4% -23% -36% -47% -68% -70% +-77% -79% -99% -99% (?{}) 222/s 4% -- -20% -34% -45% -67% -69% +-77% -78% -98% -99% substr 279/s 30% 26% -- -17% -31% -59% -61% +-71% -73% -98% -99% chop 336/s 57% 51% 20% -- -16% -50% -53% +-65% -67% -98% -99% chomp 403/s 88% 81% 44% 20% -- -40% -44% +-58% -61% -97% -99% =()= 673/s 214% 203% 141% 100% 67% -- -6% +-29% -34% -95% -98% while 718/s 235% 223% 157% 114% 78% 7% -- +-24% -30% -95% -98% index 948/s 342% 326% 240% 182% 135% 41% 32% + -- -8% -93% -97% rindex 1028/s 380% 362% 268% 206% 155% 53% 43% + 8% -- -93% -97% split_by 14354/s 6599% 6359% 5045% 4170% 3466% 2032% 1899% 1 +415% 1297% -- -59% tr / y 34600/s 16047% 15470% 12301% 10192% 8496% 5039% 4718% 3 +551% 3267% 141% -- perl-5.32.0 ========== Rate split_grep (?{}) substr chomp chop =()= while r +index index split_by tr / y split_grep 163/s -- -16% -45% -46% -53% -68% -78% + -83% -83% -99% -100% (?{}) 194/s 19% -- -34% -36% -44% -62% -73% + -79% -79% -99% -99% substr 296/s 82% 52% -- -2% -14% -42% -59% + -68% -68% -98% -99% chomp 302/s 85% 55% 2% -- -13% -41% -59% + -68% -68% -98% -99% chop 345/s 112% 78% 17% 14% -- -32% -53% + -63% -63% -98% -99% =()= 511/s 213% 163% 72% 69% 48% -- -30% + -46% -46% -97% -99% while 731/s 348% 276% 147% 142% 112% 43% -- + -22% -22% -96% -98% rindex 939/s 476% 383% 217% 211% 172% 84% 28% + -- -0% -95% -97% index 939/s 476% 383% 217% 211% 172% 84% 28% + 0% -- -95% -97% split_by 18553/s 11271% 9442% 6162% 6048% 5272% 3532% 2436% +1876% 1875% -- -46% tr / y 34462/s 21022% 17623% 11531% 11319% 9878% 6645% 4611% +3570% 3568% 86% -- perl-5.20.1 ========== Rate split_grep substr chop =()= chomp (?{}) while +index rindex split_by tr / y split_grep 181/s -- -25% -38% -41% -41% -54% -73% + -74% -74% -99% -99% substr 241/s 33% -- -17% -21% -21% -38% -64% + -65% -66% -99% -99% chop 290/s 61% 20% -- -5% -5% -26% -57% + -58% -59% -98% -99% =()= 304/s 68% 26% 5% -- -1% -22% -55% + -56% -57% -98% -99% chomp 307/s 70% 27% 6% 1% -- -22% -54% + -56% -56% -98% -99% (?{}) 392/s 117% 62% 35% 29% 28% -- -42% + -44% -44% -98% -99% while 673/s 273% 179% 132% 121% 120% 72% -- + -3% -4% -96% -98% index 697/s 286% 189% 140% 129% 127% 78% 4% + -- -1% -96% -98% rindex 704/s 289% 192% 142% 131% 129% 80% 5% + 1% -- -96% -98% split_by 16747/s 9166% 6845% 5665% 5403% 5360% 4177% 2387% +2302% 2280% -- -51% tr / y 34296/s 18876% 14124% 11707% 11170% 11081% 8658% 4994% +4819% 4773% 105% --
Transliteration (perlop#tr y / tr) is absolutely fastest.
Splitting by search value is second fastest. Although it counts as inverse, i.e. the number of not matched chunks. Here is important to note edge cases: if the pattern matches right at the beginning and/or right on the end, therefore I add 'x' at both sides, which should not be a substring of the pattern.
Other ways are way slower.
If look across versions, I can spot that (?{})(perlre#(?{-code-})) became about 2x slower between 5.20 and 5.32. The results of perl 5.14 (not shown here) are similar to 5.20, except that I need to use 'our $m' variable with (?{}) variant.

And here are variations (less than searching for single-char) with pattern of two characters:
cmpthese(-1,{ '=()=' => sub { my $m = 0; $m = () = $target =~ m/ab/g; }, 'while' => sub { my $m = 0; $m ++ while $target =~ m/ab/g; }, '(?{})' => sub { my $m = 0; $target =~ m/ab(?{ $m ++ })(*F)/g; + }, 'split_by' => sub { my $m = 0; $m = -1 + split 'ab', 'x' . $target . 'x'; }, 'chomp' => sub { my $m = 0; my $target2 = $target; local $/ = 'ab'; 0 while chomp $target2 and ++ $m or chop $target2; }, 'index' => sub { my $m = 0; my $pat_len = length 'ab'; my $pos = -$pat_len; $m ++ while -1 < ( $pos = index $target, 'ab', $pos + $pat_len + ); }, 'rindex' => sub { my $m = 0; my $pat_len = length 'ab'; my $pos = -1 + length $target; $m ++ while -1 < ( $pos = rindex $target, 'ab', $pos - $pat_le +n ); }, });
OUTPUT:
perl-5.38.2 ========== Rate (?{}) chomp =()= while rindex index +split_by (?{}) 216/s -- -64% -70% -75% -78% -81% + -96% chomp 599/s 177% -- -17% -31% -40% -47% + -90% =()= 718/s 232% 20% -- -18% -28% -36% + -88% while 872/s 303% 46% 21% -- -13% -22% + -85% rindex 999/s 362% 67% 39% 15% -- -11% + -83% index 1120/s 418% 87% 56% 28% 12% -- + -81% split_by 5749/s 2559% 860% 701% 559% 475% 413% + -- perl-5.32.0 ========== Rate (?{}) chomp =()= while index rindex +split_by (?{}) 228/s -- -59% -64% -75% -77% -78% + -95% chomp 555/s 143% -- -12% -39% -44% -47% + -89% =()= 627/s 175% 13% -- -31% -37% -41% + -88% while 913/s 300% 65% 46% -- -9% -13% + -82% index 999/s 338% 80% 59% 9% -- -5% + -80% rindex 1056/s 362% 90% 68% 16% 6% -- + -79% split_by 5046/s 2110% 810% 705% 452% 405% 378% + -- perl-5.20.1 ========== Rate =()= (?{}) chomp rindex index while +split_by =()= 372/s -- -19% -22% -54% -54% -56% + -96% (?{}) 462/s 24% -- -3% -43% -43% -45% + -95% chomp 478/s 29% 4% -- -41% -41% -43% + -95% rindex 807/s 117% 75% 69% -- -1% -3% + -92% index 814/s 119% 76% 70% 1% -- -3% + -92% while 836/s 125% 81% 75% 4% 3% -- + -91% split_by 9752/s 2524% 2013% 1941% 1108% 1099% 1066% + --
Split by search pattern is way faster than other variants. But to use it I need to think about edge cases and possible overlapping after appending or prepending additional symbols. Update. However splitting by search pattern seems to become almost 2x slower somewhere between perl-5.20 and 5.32.

Update-2. Using while chop and while chomp it is important to note, that condition will terminate if pattern or chopped character was 0, therefore to overcome this limitation I should have added length, e.g. while length cho(m)p.

Replies are listed 'Best First'.
Re: Speed of simple pattern count. A comparison
by jwkrahn (Abbot) on Jan 07, 2024 at 05:05 UTC

    I didn't see index or rindex anywhere in your code (another way to find and/or count strings inside other strings.)

    Also, it seems to me that 'x' should be prepended and appended to all the strings.

    Naked blocks are fun! -- Randal L. Schwartz, Perl hacker
      Also, it seems to me that 'x' should be prepended and appended to all the strings.

      I prepended and appended it only to "split_by" variation, it should guarantee that the amount of returned chunks is +1 of amount of separators (split pattern occurrences). I think it is not relevant for other variations.

        Most of your tests use a 30,000 character string but your "split_by" test uses a 30,002 character string so the tests are not exactly equivalent.

        Just saying ...     :)

        Naked blocks are fun! -- Randal L. Schwartz, Perl hacker