Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^9: Why does each() always re-evaluate its argument? ("for_list" )

by NERDVANA (Deacon)
on Dec 07, 2023 at 23:36 UTC ( [id://11156179]=note: print w/replies, xml ) Need Help??


in reply to Re^8: Why does each() always re-evaluate its argument? ("for_list" )
in thread Why does each() always re-evaluate its argument?

  • The pointers to the scalars placed on the stack. But that has to happen no matter the interface.

I believe LanX's performance concern is whether a hash with 100,000 elements must extend the perl stack to 200,000 elements before beginning the loop. It's not a guaranteed implementation detail since, as I recall, for (1..100_000) is optimized to use a counter and repeatedly modify a scalar rather than generating 100_000 distinct scalars at once.

So I think I'm hearing that yes it does need to allocate the perl stack to 200_000 elements, and does not perform some fancy optimized use of 'each'.

Also I would not say that "it uses the built-in iterator" unless it picks up from the current position of the internal iterator, rather than from the start of the hash. @foo= %bar always gives you the full contents of %bar, but while (my @x= each %bar) { push @foo, @x; } could omit elements if a previous 'each' iteration left off in the middle.

Replies are listed 'Best First'.
Re^10: Why does each() always re-evaluate its argument? ("for_list" )
by LanX (Saint) on Dec 08, 2023 at 00:34 UTC
    > unless it picks up from the current position of the internal iterator, rather than from the start of the hash

    That's not desirable and the reason why I was speculating about "localizing" the iterator, i.e. saving the state and restoring it after leaving the loop.

    This would allow to nest two for loops over the same hash. That's not possible while using each and this "global" effect is constantly using headaches when passing around hashes.

    But ... I can't even tell if such a localization can be implemented in a reasonable way.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      Interestingly that topic came up on perl-porters a while ago and the conclusion is that it would be possible to make standalone iterator objects for hashes in several ways, but with different costs for different modes of failure. For instance, do you throw an error if the hash gets modified during iteration? or just warn and resume the iteration as best as possible? or not even warn? or (for a performance hit) implement a fully intuitive handling of deletions and insertions?

      Several ways are possible right now using XS, without changing any core data structures. But, using the iterator would involve an XS function call, and be less performant than native iteration.

        Yeah. I touched on this in my first post. The only thing Perl lacks is the syntactical sugar to make it nice ...which could probably also be added by a module (but it would be a lot more solid if built in).

        using the iterator would involve an XS function call

        No, that can be optimized away. See Syntax::Feature::Loop and Syntax::Feature::QwComments. Specifically this. (I think were broken by 5.38, but I intend to fix soon.)

        I think consistency should have the top priority, and for_list should act like other for loops do.

        For-loop over an array will silently skip over deleted elements, hence a for-loop over a hash should too. (no tested yet)

        If a warning should be activated, then in both cases alike.

        update

        unfortunately it's NOT consistent

        $ cat tst_del.pl use v5.14; use experimental "for_list"; my %h; @h{"a".."d"} = 1..4; my @a = values %h; my $once; $once = 0; for my $e (@a) { splice @a,0,2 unless $once++; say "A: $e"; } $once = 0; for my ($k,$v) (%h) { delete @h{"a".."b"} unless $once++; say "H: $k => $v"; } say "never reached"; $ perl tst_del.pl A: 1 A: 3 H: a => 1 H: d => 4 Use of freed value in iteration at tst_del.pl line 22. $ perl -v This is perl 5, version 38, subversion 2 (v5.38.2)

        update

        tho, according to from perlsyn it's undefined behavior

        If any part of LIST is an array, "foreach" will get very confused +if you add or remove elements within the loop body, for example with "spl +ice". So don't do that.

        and the demonstrated results are quite unpredictable.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

Re^10: Why does each() always re-evaluate its argument? ("for_list" )
by ikegami (Patriarch) on Dec 08, 2023 at 17:07 UTC

    I believe LanX's performance concern is whether a hash with 100,000 elements must extend the perl stack to 200,000 elements before beginning the loop.

    Extending the stack by one element or by 200,000 has the same fixed performance cost.

    It's cheaper in fact, because it avoids 99,999 checks to see if the stack needs to be extended.

    Also I would not say that "it uses the built-in iterator" unless it picks up from the current position of the internal iterator

    By your logic, keys and values don't use the iterator.

    my %h = ( a => 5, b => 6, c => 7 ); each %h; say for keys( %h );
    b c a

    That's wrong. %h in list context, keys and values all use the iterator. They just reset it first. You can see this by the fact that it's not left in its previous state.

    my %h = ( a => 5, b => 6, c => 7 ); each %h; while ( my $k = each( %h ) ) { say $k; } say "--"; each %h; my @a = %h; while ( my $k = each( %h ) ) { say $k; }

    If the iterator wasn't used, you'd see the same output twice. But because the iterator is used, you see something like

    c a -- b c a

    In fact, keys and values are specifically documented to reset the iterator, and they're specifically document to do this and nothing else in void context. (I don't know if %h in list context is specifcally documented to reset the iterator.)

      It's fine to reset the iterator even if they don't use it, so that's not proof. But, in fact you're right; in Perl_hv_pushkv there is

      EXTEND_MORTAL(nkeys); EXTEND(SP, ext); while ((entry = hv_iternext(hv))) { if (flags & 1) { SV *keysv = newSVhek(HeKEY_hek(entry)); SvTEMP_on(keysv); PL_tmps_stack[++PL_tmps_ix] = keysv; PUSHs(keysv); } if (flags & 2) PUSHs(HeVAL(entry)); }

      I'm rather surprised they didn't optimize that to just iterate the hashtable directly. I'm sure there are some special cases handled by hv_iternext, but they could have just tested for that first.

      Extending the stack by one element or by 200,000 has the same fixed performance cost.

      How do you figure? It has to allocate the current size + 1.6MB of ram and initialize all of it. (and as the code above shows, it also has to grow the temps stack by 800KB and initialize that) Overwriting a pair of scalars in a loop would extend by 2 and probably fall within the size of the existing stack(s) anyway.

        I'm rather surprised they didn't optimize that to just iterate the hashtable directly.

        You mean inline hv_iternext? The compiler will do that if appropriate. But that's unlikely since hv_iternext is quite big. It would make no sense to put a copy of that code in the function you showed.

        How do you figure? It has to allocate the current size + 1.6MB of ram and initialize all of it.

        No initialization is needed.

        Even if caching the whole list is faster than iterating step by step, it'll be slower if a big loop is left early.

        The question is how long that list has to be.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

        ) like with last

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11156179]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-18 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found