Re: Why is "any" slow in this case?

Replies are listed 'Best First'.
Re^2: Why is "any" slow in this case? by Anonymous Monk on Jul 29, 2025 at 07:12 UTC
The anon subs in any don't capture any variables, but the ones in any_cr capture two. Introducing capturing adds overhead "for_cr" being slower than "for" confirms what you say; there's symmetry between "for_cr vs. for" and "any_cr vs. any". However, "grep_cr" doesn't seem to suffer from this capturing. Is its subroutine very different? Yet further, injection of "data;" into beginning of blocks, as you did, makes them all slow, including "grep_cr". Capturing "$c" and "$r" is OK, capturing "$data" is penalised. Something is still amiss. for => sub { W: while ( $data =~ /^(\d+) (\d+)/mg ) { for ( @skip ) { next W if ( sub { $1 eq $_ })-> ()} for ( @skip ) { next W if ( sub { $2 eq $_ })-> ()} } return 1; }, for_cr => sub { W: while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); for ( @skip ) { next W if ( sub { $c eq $_ })-> ()} for ( @skip ) { next W if ( sub { $r eq $_ })-> ()} } return 1; }, grep => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { $1 eq $_ } @skip; next if grep { $2 eq $_ } @skip; } return 1 }, grep_cr => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); next if grep { $c == $_ } @skip; next if grep { $r == $_ } @skip; } return 1 }, grep_data => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { $data; $1 eq $_ } @skip; next if grep { $data; $2 eq $_ } @skip; } return 1 }, grep_cr_data => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); next if grep { $data; $c == $_ } @skip; next if grep { $data; $r == $_ } @skip; } return 1 }, any => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if any { $1 == $_ } @skip; next if any { $2 == $_ } @skip; } return 1 }, any_cr => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { my ( $c, $r ) = ( $1, $2 ); next if any { $c == $_ } @skip; next if any { $r == $_ } @skip; } return 1 }, Rate for_cr for grep_data any_cr grep_cr_data any gr +ep grep_cr for_cr 96.4/s -- -66% -69% -70% -71% -86% -8 +8% -90% for 285/s 196% -- -8% -13% -14% -60% -6 +4% -71% grep_data 311/s 223% 9% -- -5% -6% -56% -6 +1% -68% any_cr 326/s 239% 14% 5% -- -1% -54% -5 +9% -66% grep_cr_data 331/s 244% 16% 6% 1% -- -54% -5 +9% -66% any 714/s 641% 150% 129% 119% 116% -- -1 +1% -26% grep 799/s 729% 180% 157% 145% 141% 12% +-- -18% grep_cr 968/s 905% 239% 211% 197% 192% 36% 2 +1% -- [download]	[reply] [d/l]
Re^3: Why is "any" slow in this case? by ikegami (Patriarch) on Jul 29, 2025 at 11:50 UTC
"grep_cr" doesn't seem to suffer from this capturing. Correct. `List::Util::any BLOCK LIST` is syntactic sugar for `List::Util::any sub BLOCK, LIST` because of its prototype. `$ perl -Mv5.14 -MList::Util=any -e'say 0+any { $_ > 3 } 1..5' 1 $ perl -Mv5.14 -MList::Util=any -e'say 0+any sub { $_ > 3 }, 1..5' 1` [download] A sub's access to the variables of the lexical scope in which its defined is called capturing. (A sub that captures is called a closure.) That's not the case for `CORE::grep` and `CORE::any`'s blocks. Their blocks are no more subroutines than `while`'s. `$ perl -MO=Concise,-exec -MList::Util=any -e'any { /x/ } @a' 1 <0> enter v 2 <;> nextstate(main 31 -e:1) v:{ 3 <0> pushmark s 4 <$> anoncode[CV CODE] sRM 5 <#> gv[a] s 6 <1> rv2av[t4] lKM/1 7 <#> gv[any] s 8 <1> entersub[t5] vKS/TARG 9 <@> leave[1 ref] vKP/REFC -e syntax OK` [download] `$ perl -MO=Concise,-exec -e'grep { /x/ } @a' 1 <0> enter v 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gv[*a] s 5 <1> rv2av[t2] lKM/1 6 <@> grepstart K 7 <\|> grepwhile(other->8)[t3] vK 8 <0> enter s 9 <;> nextstate(main 2 -e:1) v:{ a </> match(/"x"/) s b <@> leave sKP goto 7 c <@> leave[1 ref] vKP/REFC -e syntax OK` [download] Note the `anoncode` (`sub { }`) in one, and the actual code of the block (`match`) in the other.	[reply] [d/l] [select]
Re^3: Why is "any" slow in this case? by Anonymous Monk on Jul 31, 2025 at 21:33 UTC
`grep => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { $1 eq $_ } @skip; next if grep { $2 eq $_ } @skip; } return 1 }, grep_1 => sub { while ( $data =~ /^(\d+) (\d+)/mg ) { next if grep { 1; $1 eq $_ } @skip; next if grep { 1; $2 eq $_ } @skip; } return 1 }, grep_1 316/s -- -61% grep 811/s 156% --` [download] Writing as "answer" to self, because I really don't want to ping anyone, moreover request explanation; these tests are becoming stupid in addition to idle. Apparently "grep" is capable to optimise its braces (block) away (sometimes. Though not in case of e.g. `grep { /x/ } @a`, but that's digression): `perl -MO=Concise,-exec -e "grep { $1 eq $_ } @a" 1 <0> enter v 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gv[a] s 5 <1> rv2av[t4] lKM/1 6 <@> grepstart K 7 <\|> grepwhile(other->8)[t5] vK 8 <#> gvsv[1] s 9 <#> gvsv[_] s a <2> seq sK/2 goto 7 b <@> leave[1 ref] vKP/REFC` [download] same output without a block i.e. for `grep $1 eq $_, @a`. But: `perl -MO=Concise,-exec -e "grep { 1; $1 eq $_ } @a" 1 <0> enter v 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gv[a] s 5 <1> rv2av[t4] lKM/1 6 <@> grepstart K 7 <\|> grepwhile(other->8)[t5] vK 8 <0> enter s 9 <;> nextstate(main 2 -e:1) v a <#> gvsv[1] s b <#> gvsv[_] s c <2> seq sK/2 d <@> leave sKP goto 7 e <@> leave[1 ref] vKP/REFC` [download] And so grep's not-really-anon-sub-but-something-else, even if it doesn't capture outside vars here, is significantly slower than any's real-anon-sub when it doesn't capture vars too, and actually as slow as the latter when it captures vars.	[reply] [d/l] [select]
Re^2: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 15:51 UTC
> Everytime you read from it, it gets repopulated (the matched substring is copied into it from the matched string). Do you happen to know why? I can't see any side-effects justifying this behaviour. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]
Re^3: Why is "any" slow in this case? by ysth (Canon) on Jul 28, 2025 at 16:56 UTC
So the regex engine doesn't waste any time updating globals that may not even get used. Instead the cost is deferred until the global is accessed, and incurred every time it is accessed because that's how magic works.	[reply]
Re^3: Why is "any" slow in this case? by ikegami (Patriarch) on Jul 28, 2025 at 16:45 UTC
That's how magic variables works. Every time you can read a variable with get magic, a getter function is called to populate it first. Every time you write to a variable with set magic, a setter function is called to process the new value afterwards. use v5.40; use Variable::Magic qw( cast wizard ); my $wiz = wizard( get => sub { say sprintf 'getter called for %X', refaddr( $_[0] ); ${ $_[0] } = int( rand( 100 ) ); }, ); my $var; say sprintf '`$var` is %X', refaddr( \$var ); cast $var, $wiz; for ( 1 .. 4 ) { say "Loop: $_"; say "`\$var` has value $var"; } [download] `$var` is 5F5668128E20 Loop: 1 getter called for 5F5668128E20 `$var` has value 22 Loop: 2 getter called for 5F5668128E20 `$var` has value 62 Loop: 3 getter called for 5F5668128E20 `$var` has value 44 Loop: 4 getter called for 5F5668128E20 `$var` has value 70 [download] The alternative to magic would be to preemptively copy substrings of the matched string into $`, `$&`, `$'` and `$n`, `$+{name}` and `$-{name}`.	[reply] [d/l] [select]
Re^4: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 16:54 UTC
Well the question could be rephrased into Why do we need $1 to be magic, when it's read-only? Threading? > would be to preemptively copy substrings of the matched string into if this is about avoiding overhead for optional variables, retrieving it just once on demand would be sufficient. The OP is accessing the same $1 4 times in a row. Update the code examples were updated in the parent post while I was replying. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]
Re^5: Why is "any" slow in this case? by ikegami (Patriarch) on Jul 28, 2025 at 18:18 UTC
Re^6: Why is "any" slow in this case? by LanX (Saint) on Jul 28, 2025 at 20:33 UTC
Some notes below your chosen depth have not been shown here

Update