in reply to Re: Why is "any" slow in this case?
in thread Why is "any" slow in this case?

Thanks for the tip, I'll use hash look-up in refactored version. + I was wrong about numification, the picture remains the same with string comparison (hello, AI). The unexpected outcome (for me) is "never access $1, etc. more than twice per regexp executed, but assign results to throwaway lexicals instead. Even if 'access' is masked/folded in loops". Interesting. The exception of any_cr remains unresolved mystery. Thanks everyone (except "AI" with its rubbish, which was NOT interesting). Disappointed as usual about the latter.

Replies are listed 'Best First'.
Re^3: Why is "any" slow in this case?
by LanX (Saint) on Jul 28, 2025 at 12:06 UTC
    Three remarks

    • You didn't need to use $1 etc at all, a regex will return the captures in list context. my @matches = ( $str =~ /pa(tt)ern/g )
    • I suppose the trie optimization of alternate numbers directly inside a negative look ahead (?!(0|15|16|31)\D)(\d+) to be a very fast alternative.°
    • the AI discussion happened in the context of another meditation, I only referenced it here for completeness.

    Happy testing!

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

    °) TIMTOWTDI

      hash => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if exists $hash{$1} or exists $hash{$2}; } return 1; }, ahead => sub { while ( $data =~ /^(?!(?:0|15|16|31)\D)(\d+)\N{SPACE} (?!(?:0|15|16|31)\D)(\d+)/mgx ) { } return 1; }, Rate hash ahead hash 1742/s -- -31% ahead 2541/s 46% --

      Wow, thanks. I'll refactor with this, then. As to (1), simply generating a list of few hundred captures is slower, even without working it in pairs later. Sorry about (3), I didn't mean to reproach anyone.

        > Wow, thanks. I'll refactor with this, then.

        please test thoroughly, I just hacked the code into my mobile as an example ... be also careful about the numbering of the captures or use an (?:...) for non-capture in the negative list.

        > simply generating a list of few hundred captures is slower,

        I can't follow, since you are using the /g modifier, each iteration will only capture 2 groups and then continue where it left of.

        hence my ( $c, $r ) = ( $data =~ /^(\d+) (\d+)/mg ) should nicely do.

        (Haven't tested the performance, but every statement normally counts)

        > I'll refactor

        You initially said that performance wasn't an issue and you were just curious.

        I'd rather recommend to go for the clearest code, not for the fastest. Because in the long run maintenance costs you the most.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

Re^3: Why is "any" slow in this case?
by marto (Cardinal) on Jul 28, 2025 at 11:54 UTC

    "Thanks everyone (except "AI" with its rubbish, which was NOT interesting). Disappointed as usual about the latter.”

    👏