in reply to Re: Confused by RegEx count
in thread Confused by RegEx count

Or you could just match instead of substitute, for a few percent faster.
sub match { my $count =()= $str =~ /$q/g }

Replies are listed 'Best First'.
Re^3: Confused by RegEx count
by choroba (Cardinal) on Feb 21, 2024 at 09:23 UTC
    Interestingly, on my machine:
    Rate length_subst match subst trans +literation length_subst 2864/s -- -89% -90% + -97% match 25687/s 797% -- -13% + -74% subst 29356/s 925% 14% -- + -70% transliteration 98682/s 3346% 284% 236% + --

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      That's rather counter-intuitive. Just ran them myself and for me, match beats subst:

      Rate length_subst subst match trans +literation length_subst 2080/s -- -88% -90% + -97% subst 17998/s 765% -- -16% + -77% match 21423/s 930% 19% -- + -72% transliteration 76797/s 3592% 327% 258% + --

      This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux-thread-multi.


      🦛

        Yet another data point with cygwin, everything updated 2/21/24.
                            Rate    length_subst           subst transliteration
        length_subst      6174/s              --            -92%            -98%
        subst            79622/s           1190%              --            -78%
        transliteration 355231/s           5654%            346%              --
        
        Yes, I'm getting similar results in 5.39.6, but in 5.26.1, the substitution is faster.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      I get results like hippo's with perl 5.36 on a recent AMD Ryzen
      Rate length_subst subst match trans +literation length_subst 8970/s -- -89% -91% + -97% subst 78884/s 779% -- -19% + -78% match 97126/s 983% 23% -- + -73% transliteration 355019/s 3858% 350% 266% + --

      Though, depending how many matches there are, does perl have to assemble a stack of N elements (copying each character into its own scalar) before assigning the list to the scalar to get the count? With the subst, the right optimizations could allow it to update that one character without changing the length of the string or copying anything, so it could be fast, and then doesn't need to assemble a list of the matches.