in reply to Confused by RegEx count

Other monks have already explained what's going on. Let me point to efficiency of the solutions:

Note that the transliteration is much faster than the other option. Even when the character is variable and we have to use string eval (whip! whip!), it's much faster.

Instead of using substitution with length, you can use global substitution only, as it returns the number of replacements in scalar context. But it's still slower than transliteration:

#! /usr/bin/perl use warnings; use strict; use Benchmark qw{ cmpthese }; my $orig = 'Just another Perl hacker,' x 100; my $str = $orig; my $char = 'r'; my $q = quotemeta $char; sub transliteration { my $count = eval "\$str =~ tr/$q//" } sub length_subst { my $count = length( $str =~ s/[^$q]//rg ) } sub subst { my $count = $str =~ s/$q/$char/g } transliteration() eq length_subst() or die 'Different t-ls'; transliteration() eq subst() or die 'Different t-s'; $orig eq $str or die 'Changed'; cmpthese(-3, { transliteration => \&transliteration, length_subst => \&length_subst, subst => \&subst, }); __END__ Rate length_subst subst transliterati +on length_subst 2833/s -- -91% -9 +7% subst 30244/s 968% -- -7 +0% transliteration 102423/s 3515% 239% +--

Update: Introduced quotemeta to transliteration, too. It didn't change the results significantly.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^2: Confused by RegEx count
by NERDVANA (Priest) on Feb 21, 2024 at 00:06 UTC
    Or you could just match instead of substitute, for a few percent faster.
    sub match { my $count =()= $str =~ /$q/g }
      Interestingly, on my machine:
      Rate length_subst match subst trans +literation length_subst 2864/s -- -89% -90% + -97% match 25687/s 797% -- -13% + -74% subst 29356/s 925% 14% -- + -70% transliteration 98682/s 3346% 284% 236% + --

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        That's rather counter-intuitive. Just ran them myself and for me, match beats subst:

        Rate length_subst subst match trans +literation length_subst 2080/s -- -88% -90% + -97% subst 17998/s 765% -- -16% + -77% match 21423/s 930% 19% -- + -72% transliteration 76797/s 3592% 327% 258% + --

        This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux-thread-multi.


        🦛

        I get results like hippo's with perl 5.36 on a recent AMD Ryzen
        Rate length_subst subst match trans +literation length_subst 8970/s -- -89% -91% + -97% subst 78884/s 779% -- -19% + -78% match 97126/s 983% 23% -- + -73% transliteration 355019/s 3858% 350% 266% + --

        Though, depending how many matches there are, does perl have to assemble a stack of N elements (copying each character into its own scalar) before assigning the list to the scalar to get the count? With the subst, the right optimizations could allow it to update that one character without changing the length of the string or copying anything, so it could be fast, and then doesn't need to assemble a list of the matches.