in reply to The Virtue of Laziness

Replying to myself for a summary, hope that's ok.

Preface: I'm not trying to solve any real world problems. Please don't take these thoughts too serious.

The questions asked in this thread lead me to a clearer understanding of my own demand. It is not only about "How to exit a loop as quickly as possible" as it might appear in my original post as this is actually only half of the question. Looking at subs that naturally shall "short circuit" when applied to a list:

When considering the performance of these implementations within a sub, obviously we should not blame them for the effort needed to place its arguments onto the stack. Therefore the arrays shall be passed by reference to the implementing sub. This is where the first misunderstanding stems from: The term "lazy deref" in my OP implies the need for an array ref to be efficient at all.

Next, I'd like to extend the scenario to a more realistic one. It's not about dying as fast as you can. It's about efficiently process a (large) list while at the same time being able to terminate the process as quickly as possible. Let's take a look on all in its most simple form: are all values of a list/array non-zero?

The question I should have asked in the beginning:

How can we implement an array processing sub that is fast when the whole array has to be processed and is fast too, if there is an early short circuit?

Some questions from this thread:

Let's focus on the all example. Implementations that come to mind are all of List::Util, List::Gen and a simple loop. These behave very different in both situations.

Here is a comparison of these implementations, along with an alternative to the map grep map approach.

Hopefully I got rid of non-standard features that where mindlessly used in my OP as noted by Danny in Re: The Virtue of Laziness.

#!/usr/bin/perl use strict; use warnings; use feature 'say'; use List::Util (); use List::Gen qw(:modify :filter); use Benchmark 'cmpthese'; for my $first (0, 1) { my $list = [$first, (1) x (1_000_000)]; say "first: $first"; cmpthese(0, { lu => sub {all_lu($list)}, for => sub {all_for($list)}, lg => sub {all_gen($list)}, }); say ''; }; say "chaining:"; my @list = (1 .. 10_000); # Lookahead (default) would generate one more value local $List::Gen::LOOKAHEAD = 0; cmpthese(0, { mgm => sub { (map {$_->[0]} grep {!($_->[1] % 10)} map {[$_, tf($_)]} @ +list)[0] }, gfg => sub { (gen {$_->[0]} filter {!($_->[1] % 10)} gen {[$_, tf($_)]} + \@list)->[0] }, }); # artificially "heavy" transformation. sub tf { select(undef, undef, undef, 0.000001); $_[0] + 1; } # short-circuit "all" sub all_lu { List::Util::all {$_} @{shift()}; } # short-circuit for-loop sub all_for { for (@{shift()}) { return 0 unless $_; } 1; } # short-circuit lazy generator sub all_gen { !List::Gen(shift)->do(sub {&last(1) unless $_}); } __DATA__ first: 0 Rate lu lg for lu 396/s -- -99% -100% lg 27574/s 6869% -- -100% for 10156692/s 2567064% 36734% -- first: 1 Rate lg for lu lg 4.62/s -- -90% -93% for 48.5/s 949% -- -31% lu 70.6/s 1427% 46% -- chaining: Rate mgm gfg mgm 12.5/s -- -100% gfg 3790/s 30320% --

List::Util would certainly win both races if it operated on array refs.

All in all I don't see the need for a new feature anymore. The overhead of putting a large array onto the stack can be avoided by array references and chainable lazy generators are available via List::Gen.

Everything would be fine, if List::Gen would be maintained. The latest release dates back to 2011 and unit tests are failing since perl v5.19. As far as I can see, changes in perl lead to the failing tests themselves, not the tested functionality. I was able to fix all but three tests easily.

Greetings,
🐻

$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

Replies are listed 'Best First'.
Re^2: The Virtue of Laziness
by hv (Prior) on May 12, 2024 at 23:49 UTC

    Thanks, that looks like a much more compelling benchmark. I can well imagine Paul Evans seeing the first two benchmarks and being inspired to make List::Util::all (and related functions) competitive with for for the "first: 0" case. But hopefully not before completing the perl release that's due soon. :)

    I believe the special handling of for (@array) is switched in by the peephole optimizer; I'm not sure whether that means it would need a new op for each of the relevant List::Util functions or whether adding one just for reduce() would be sufficient.

    I'd recommend posting that benchmark (without the "chaining" section) as part of a feature request something like "make List::Util functions as fast as 'for (@array)'" in github.

    The "chaining" example is another matter: I think solving that would need something a lot closer to the generalized "lazy evaluation" I mentioned before. It may be worth asking about that too, but I'd suggest doing so in a separate ticket to avoid muddying the waters.