Why is "any" slow in this case?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Why is "any" slow in this case? by ysth (Canon) on Jul 28, 2025 at 02:33 UTC
List::Util's any may be XS, but it still needs to call the Perl subroutine you pass it for each argument, and a sub call is pretty expensive. Perl 5.42 adds experimental any and all operators that should be faster, since they do away with the sub, but I suspect will still be slower than your sequence of ors.	[reply]
Re: Why is "any" slow in this case? by hippo (Archbishop) on Jul 28, 2025 at 10:25 UTC
In your any and any_cr branches you are running `any` twice when you only need to run it once. If I fold the two together like this: `any_fold => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if any { $1 == $_ \|\| $2 == $_ } @skip; } return 1 }, any_cr_fold => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); next if any { $c == $_ \|\| $r == $_ } @skip; } return 1 },` [download] I get these results: `any_cr 461/s -- -13% -58% -62% -68% + -76% any_cr_fold 527/s 14% -- -52% -57% -63% + -72% any 1090/s 137% 107% -- -11% -24% + -43% any_fold 1221/s 165% 132% 12% -- -15% + -36% ugly 1436/s 212% 172% 32% 18% -- + -25% ugly_cr 1914/s 315% 263% 76% 57% 33% + --` [download] Then, since I am on 5.42.0 and can use the new `any` keyword (suggested by ysth) by replacing your use of List::Util with: `no warnings "experimental::keyword_any"; use experimental 'keyword_any';` [download] it gives these even better results: `Rate any any_fold any_cr ugly any_cr_fold + ugly_cr any 1037/s -- -10% -20% -26% -31% + -46% any_fold 1152/s 11% -- -11% -18% -23% + -40% any_cr 1297/s 25% 13% -- -7% -14% + -32% ugly 1399/s 35% 21% 8% -- -7% + -27% any_cr_fold 1506/s 45% 31% 16% 8% -- + -21% ugly_cr 1914/s 85% 66% 48% 37% 27% + --` [download] ugly_cr still wins but not by so much. But what's really interesting to me is that with List::Util, any_cr is much slower than any while with the new keyword, any_cr is slightly faster than any. 🦛	[reply] [d/l] [select]
Re: Why is "any" slow in this case? by choroba (Cardinal) on Jul 28, 2025 at 09:27 UTC
Note that grep is faster than any (but still slower than `ugly`). `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re: Why is "any" slow in this case? by ikegami (Patriarch) on Jul 28, 2025 at 15:40 UTC
As for `ugly` vs `ugly_cr`, `$1` is a magic variable. Everytime you read from it, it gets repopulated (the matched substring is copied into it from the matched string) and subsequently numified. `ugly_cr` is faster because it cuts down the number of times that happens by four. As for `any` vs `any_cr`, The anon subs in `any` don't capture any variables, but the ones in `any_cr` capture two. Introducing capturing adds overhead that's more expensive than the magic on `$1`. Update: Added "and subsequently numified". Update: Confirmed that the overhead from capturing is the culprit, and adjusted the text appropriately. I confirmed this by changing all four `any { ... }` to `any { $data; ... }`. With this change, `any` becomes slower than `any_cr`.	[reply] [d/l] [select]
Re^2: Why is "any" slow in this case? by Anonymous Monk on Jul 29, 2025 at 07:12 UTC
The anon subs in any don't capture any variables, but the ones in any_cr capture two. Introducing capturing adds overhead "for_cr" being slower than "for" confirms what you say; there's symmetry between "for_cr vs. for" and "any_cr vs. any". However, "grep_cr" doesn't seem to suffer from this capturing. Is its subroutine very different? Yet further, injection of "data;" into beginning of blocks, as you did, makes them all slow, including "grep_cr". Capturing "$c" and "$r" is OK, capturing "$data" is penalised. Something is still amiss. for => sub { W: while ( $data =~ /^(\d+) (\d+)/mg ) { for ( @skip ) { next W if ( sub { $1 eq $_ })-> ()} for ( @skip ) { next W if ( sub { $2 eq $_ })-> ()} } return 1; }, for_cr => sub { W: while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); for ( @skip ) { next W if ( sub { $c eq $_ })-> ()} for ( @skip ) { next W if ( sub { $r eq $_ })-> ()} } return 1; }, grep => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { $1 eq $_ } @skip; next if grep { $2 eq $_ } @skip; } return 1 }, grep_cr => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); next if grep { $c == $_ } @skip; next if grep { $r == $_ } @skip; } return 1 }, grep_data => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { $data; $1 eq $_ } @skip; next if grep { $data; $2 eq $_ } @skip; } return 1 }, grep_cr_data => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); next if grep { $data; $c == $_ } @skip; next if grep { $data; $r == $_ } @skip; } return 1 }, any => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if any { $1 == $_ } @skip; next if any { $2 == $_ } @skip; } return 1 }, any_cr => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); next if any { $c == $_ } @skip; next if any { $r == $_ } @skip; } return 1 }, Rate for_cr for grep_data any_cr grep_cr_data any gr +ep grep_cr for_cr 96.4/s -- -66% -69% -70% -71% -86% -8 +8% -90% for 285/s 196% -- -8% -13% -14% -60% -6 +4% -71% grep_data 311/s 223% 9% -- -5% -6% -56% -6 +1% -68% any_cr 326/s 239% 14% 5% -- -1% -54% -5 +9% -66% grep_cr_data 331/s 244% 16% 6% 1% -- -54% -5 +9% -66% any 714/s 641% 150% 129% 119% 116% -- -1 +1% -26% grep 799/s 729% 180% 157% 145% 141% 12% +-- -18% grep_cr 968/s 905% 239% 211% 197% 192% 36% 2 +1% -- [download]	[reply] [d/l]
Re^3: Why is "any" slow in this case? by ikegami (Patriarch) on Jul 29, 2025 at 11:50 UTC
"grep_cr" doesn't seem to suffer from this capturing. Correct. `List::Util::any BLOCK LIST` is syntactic sugar for `List::Util::any sub BLOCK, LIST` because of its prototype. `$ perl -Mv5.14 -MList::Util=any -e'say 0+any { $_ > 3 } 1..5' 1 $ perl -Mv5.14 -MList::Util=any -e'say 0+any sub { $_ > 3 }, 1..5' 1` [download] A sub's access to the variables of the lexical scope in which its defined is called capturing. (A sub that captures is called a closure.) That's not the case for `CORE::grep` and `CORE::any`'s blocks. Their blocks are no more subroutines than `while`'s. `$ perl -MO=Concise,-exec -MList::Util=any -e'any { /x/ } @a' 1 <0> enter v 2 <;> nextstate(main 31 -e:1) v:{ 3 <0> pushmark s 4 <$> anoncode[CV CODE] sRM 5 <#> gv[a] s 6 <1> rv2av[t4] lKM/1 7 <#> gv[any] s 8 <1> entersub[t5] vKS/TARG 9 <@> leave[1 ref] vKP/REFC -e syntax OK` [download] `$ perl -MO=Concise,-exec -e'grep { /x/ } @a' 1 <0> enter v 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gv[*a] s 5 <1> rv2av[t2] lKM/1 6 <@> grepstart K 7 <\|> grepwhile(other->8)[t3] vK 8 <0> enter s 9 <;> nextstate(main 2 -e:1) v:{ a </> match(/"x"/) s b <@> leave sKP goto 7 c <@> leave[1 ref] vKP/REFC -e syntax OK` [download] Note the `anoncode` (`sub { }`) in one, and the actual code of the block (`match`) in the other.	[reply] [d/l] [select]
Re^3: Why is "any" slow in this case? by Anonymous Monk on Jul 31, 2025 at 21:33 UTC
`grep => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { $1 eq $_ } @skip; next if grep { $2 eq $_ } @skip; } return 1 }, grep_1 => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { 1; $1 eq $_ } @skip; next if grep { 1; $2 eq $_ } @skip; } return 1 }, grep_1 316/s -- -61% grep 811/s 156% --` [download] Writing as "answer" to self, because I really don't want to ping anyone, moreover request explanation; these tests are becoming stupid in addition to idle. Apparently "grep" is capable to optimise its braces (block) away (sometimes. Though not in case of e.g. `grep { /x/ } @a`, but that's digression): `perl -MO=Concise,-exec -e "grep { $1 eq $_ } @a" 1 <0> enter v 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gv[a] s 5 <1> rv2av[t4] lKM/1 6 <@> grepstart K 7 <\|> grepwhile(other->8)[t5] vK 8 <#> gvsv[1] s 9 <#> gvsv[_] s a <2> seq sK/2 goto 7 b <@> leave[1 ref] vKP/REFC` [download] same output without a block i.e. for `grep $1 eq $_, @a`. But: `perl -MO=Concise,-exec -e "grep { 1; $1 eq $_ } @a" 1 <0> enter v 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gv[a] s 5 <1> rv2av[t4] lKM/1 6 <@> grepstart K 7 <\|> grepwhile(other->8)[t5] vK 8 <0> enter s 9 <;> nextstate(main 2 -e:1) v a <#> gvsv[1] s b <#> gvsv[_] s c <2> seq sK/2 d <@> leave sKP goto 7 e <@> leave[1 ref] vKP/REFC` [download] And so grep's not-really-anon-sub-but-something-else, even if it doesn't capture outside vars here, is significantly slower than any's real-anon-sub when it doesn't capture vars too, and actually as slow as the latter when it captures vars.	[reply] [d/l] [select]
Re^2: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 15:51 UTC
> Everytime you read from it, it gets repopulated (the matched substring is copied into it from the matched string). Do you happen to know why? I can't see any side-effects justifying this behaviour. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]
Re^3: Why is "any" slow in this case? by ysth (Canon) on Jul 28, 2025 at 16:56 UTC
So the regex engine doesn't waste any time updating globals that may not even get used. Instead the cost is deferred until the global is accessed, and incurred every time it is accessed because that's how magic works.	[reply]
Re^3: Why is "any" slow in this case? by ikegami (Patriarch) on Jul 28, 2025 at 16:45 UTC
That's how magic variables works. Every time you can read a variable with get magic, a getter function is called to populate it first. Every time you write to a variable with set magic, a setter function is called to process the new value afterwards. use v5.40; use Variable::Magic qw( cast wizard ); my $wiz = wizard( get => sub { say sprintf 'getter called for %X', refaddr( $_[0] ); ${ $_[0] } = int( rand( 100 ) ); }, ); my $var; say sprintf '`$var` is %X', refaddr( \$var ); cast $var, $wiz; for ( 1 .. 4 ) { say "Loop: $_"; say "`\$var` has value $var"; } [download] `$var` is 5F5668128E20 Loop: 1 getter called for 5F5668128E20 `$var` has value 22 Loop: 2 getter called for 5F5668128E20 `$var` has value 62 Loop: 3 getter called for 5F5668128E20 `$var` has value 44 Loop: 4 getter called for 5F5668128E20 `$var` has value 70 [download] The alternative to magic would be to preemptively copy substrings of the matched string into $`, `$&`, `$'` and `$n`, `$+{name}` and `$-{name}`.	[reply] [d/l] [select]
Re^4: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 16:54 UTC
Re^5: Why is "any" slow in this case? by ikegami (Patriarch) on Jul 28, 2025 at 18:18 UTC
Some notes below your chosen depth have not been shown here
Re: Why is "any" slow in this case? by sleet (Monk) on Jul 28, 2025 at 06:09 UTC
A hash seems the best option in the case described. It is also faster by several orders of magnitude (for your benchmarks, however accurate it may be). `my %hash; @hash{0, 15, 16, 31} = (); # then add this to your benchmark hash => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if exists $hash{$1} or exists $hash{$2}; return 1; } },` [download] `Rate any_cr any ugly ugly_cr hash2 hash any_cr 865/s -- -37% -54% -65% -100% -100% any 1382/s 60% -- -27% -44% -100% -100% ugly 1896/s 119% 37% -- -24% -100% -100% ugly_cr 2489/s 188% 80% 31% -- -100% -100% hash 3084047/s 356493% 222992% 162532% 123813% 26% --` [download]	[reply] [d/l] [select]
Re^2: Why is "any" slow in this case? by Anonymous Monk on Jul 28, 2025 at 06:39 UTC
You have the 'return 1' in the wrong place...	[reply]
Re^3: Why is "any" slow in this case? by sleet (Monk) on Jul 28, 2025 at 07:44 UTC
oops, my bad: `Rate any_cr any ugly ugly_cr hash any_cr 869/s -- -38% -54% -65% -77% any 1395/s 61% -- -26% -43% -63% ugly 1896/s 118% 36% -- -23% -49% ugly_cr 2465/s 184% 77% 30% -- -34% hash 3725/s 329% 167% 96% 51% --` [download]	[reply] [d/l]
Re^4: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 08:44 UTC
Re^2: Why is "any" slow in this case? by Anonymous Monk on Jul 28, 2025 at 11:20 UTC
Thanks for the tip, I'll use hash look-up in refactored version. + I was wrong about numification, the picture remains the same with string comparison (hello, AI). The unexpected outcome (for me) is "never access $1, etc. more than twice per regexp executed, but assign results to throwaway lexicals instead. Even if 'access' is masked/folded in loops". Interesting. The exception of any_cr remains unresolved mystery. Thanks everyone (except "AI" with its rubbish, which was NOT interesting). Disappointed as usual about the latter.	[reply]
Re^3: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 12:06 UTC
Three remarks You didn't need to use $1 etc at all, a regex will return the captures in list context. `my @matches = ( $str =~ /pa(tt)ern/g )` I suppose the trie optimization of alternate numbers directly inside a negative look ahead `(?!(0\|15\|16\|31)\D)(\d+)` to be a very fast alternative.� the AI discussion happened in the context of another meditation, I only referenced it here for completeness. Happy testing! Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} �) TIMTOWTDI	[reply] [d/l] [select]
Re^4: Why is "any" slow in this case? by Anonymous Monk on Jul 28, 2025 at 14:04 UTC
Re^5: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 14:14 UTC
Some notes below your chosen depth have not been shown here
Re^3: Why is "any" slow in this case? by marto (Cardinal) on Jul 28, 2025 at 11:54 UTC
"Thanks everyone (except "AI" with its rubbish, which was NOT interesting). Disappointed as usual about the latter.� 👏	[reply]
Re: Why is "any" slow in this case? by Anonymous Monk on Jul 28, 2025 at 06:15 UTC
Generally speaking: Sub routines and iteration are slower than operations. So ugly is always going to be faster than any, unless the ugly operation gets absurdly large, probably 100's of numbers. Secondly, every time you access $1 or $2, there is a check of the regexp context, its not just taking some already known value, which adds overhead. So _cr of assigning $1/$2 to a local variable should always end up faster compared to even 2 or 3 uses of $1/$2.	[reply]
Re^2: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 08:40 UTC
> every time you access $1 or $2, there is a check of the regexp context, Could you please elaborate what this means, especially "regexp context"? It rings a bell, but this seems counter-intuitive with a read only value like $1. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]
Re^3: Why is "any" slow in this case? by ysth (Canon) on Jul 28, 2025 at 16:22 UTC
If you use $1 and then Devel::Peek::Dump it, you will see the values you got stored in it, but that's all ephemeral; the next access will trigger the get magic again, which does a lot of work to get info from where the regex engine stashes it (and leaves it in the $1 SV so the remainder of that operation can use it).	[reply]
Re: Why is "any" slow in this case? by NERDVANA (Priest) on Jul 31, 2025 at 20:05 UTC
`any { $1 == $_ }` [download] is a coderef that refers only to global variables, so is compiled once and (presumably) only one CV ever exists for this compiled code. `any { $c == $_ }` [download] is a coderef that refers to lexical variables, and so on every iteration it has some overhead to link the lexical scope to the coderef. I don't know the exact details of what that entails, but if you consider the possibility that the 'any' function could store those code references in a list somewhere, I think it means there is a newly allocated CV on every iteration.	[reply] [d/l] [select]
Re^2: Why is "any" slow in this case? by LanX (Saint) on Jul 31, 2025 at 21:21 UTC
`$ perl -MO=Concise,func -E'my $y; { my $x; sub func { my $z; say $x+$y ++$z }}' main::func: 9 <1> leavesub[1 ref] K/REFC,1 ->(end) - <@> lineseq KP ->9 1 <;> nextstate(main 6 -e:1) v:%,us,fea=15 ->2 2 <0> padsv[$z:6,7] vM/LVINTRO ->3 3 <;> nextstate(main 7 -e:1) v:%,us,fea=15 ->4 8 <@> say sK ->9 4 <0> padrange[$x:FAKE:; $y:FAKE:] /range=2 ->5 7 <2> add[t5] sK/2 ->8 5 <2> add[t4] sK/2 ->6 - <0> padsv[$x:FAKE:] s ->- - <0> padsv[$y:FAKE:] s ->5 6 <0> padsv[$z:6,7] s ->7 -e syntax OK` [download] IIRC: LexPads (Lexical Scratchpads) are kind of a hash-like structure, roughly similar to symbol-tables. Each scope of the sub has a Pad starting with 0 for the inner scope with `{ '$z' => SCALARREF }`. $x is in Pad-1, $y in Pad-2. The Pads are inspected starting with Pad-0 to find the reference. (see PadWalker for more) So yes there might be some look up overhead involved, but I'd be surprised if the encountered refs weren't cached at first execution. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^3: Why is "any" slow in this case? by NERDVANA (Priest) on Aug 02, 2025 at 18:32 UTC
My point was this: `use v5.40; my @subs; sub dosomething :prototype(&) { push @subs, $_[0]; } for (1..2) { my $x; dosomething { $x+1 } } for (1..2) { my $x; dosomething { $_+1 } } say for @subs` [download] Output: `CODE(0x56271152c660) CODE(0x5627115470c8) CODE(0x5627115999e0) CODE(0x5627115999e0)` [download] You can see by the addresses that Perl had to allocate a new coderef on each iteration of the first loop, but was able to reuse the coderef on the second loop.	[reply] [d/l] [select]
Re^4: Why is "any" slow in this case? by ikegami (Patriarch) on Aug 05, 2025 at 01:52 UTC
Re^4: Why is "any" slow in this case? by LanX (Saint) on Aug 04, 2025 at 21:28 UTC
Re: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 10:20 UTC
FWIW: I tested ChatGPT 4o with this question, see here Re^7: AI in the workplace (... in the Monastery) Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]