Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^8: Why does each() always re-evaluate its argument? ("for_list" )

by ikegami (Patriarch)
on Dec 07, 2023 at 20:27 UTC ( [id://11156172]=note: print w/replies, xml ) Need Help??


in reply to Re^7: Why does each() always re-evaluate its argument? ("for_list" )
in thread Why does each() always re-evaluate its argument?

It uses the built-in iterator to obtain the keys and values from the hash and push them unto the stack before any looping is performed.

If you want me to be more precise, no extra copying occurs.

  • The keys are copied (i.e. fresh scalars are made for the keys) since they're not found as scalars in the hash. But that has to happen no matter the interface.
  • The pointers to the scalars placed on the stack. But that has to happen no matter the interface.
$ perl -e' use v5.14; use experimental qw( for_list ); my %h = ( a => 4, b => 6 ); my $i = 0; for my ( $k, $v ) ( %h ) { if ( !$i++ ) { ++$h{a}; ++$h{b}; } say "$k:$v"; } ' a:5 b:7
  • Comment on Re^8: Why does each() always re-evaluate its argument? ("for_list" )
  • Download Code

Replies are listed 'Best First'.
Re^9: Why does each() always re-evaluate its argument? ("for_list" )
by NERDVANA (Deacon) on Dec 07, 2023 at 23:36 UTC
    • The pointers to the scalars placed on the stack. But that has to happen no matter the interface.

    I believe LanX's performance concern is whether a hash with 100,000 elements must extend the perl stack to 200,000 elements before beginning the loop. It's not a guaranteed implementation detail since, as I recall, for (1..100_000) is optimized to use a counter and repeatedly modify a scalar rather than generating 100_000 distinct scalars at once.

    So I think I'm hearing that yes it does need to allocate the perl stack to 200_000 elements, and does not perform some fancy optimized use of 'each'.

    Also I would not say that "it uses the built-in iterator" unless it picks up from the current position of the internal iterator, rather than from the start of the hash. @foo= %bar always gives you the full contents of %bar, but while (my @x= each %bar) { push @foo, @x; } could omit elements if a previous 'each' iteration left off in the middle.

      > unless it picks up from the current position of the internal iterator, rather than from the start of the hash

      That's not desirable and the reason why I was speculating about "localizing" the iterator, i.e. saving the state and restoring it after leaving the loop.

      This would allow to nest two for loops over the same hash. That's not possible while using each and this "global" effect is constantly using headaches when passing around hashes.

      But ... I can't even tell if such a localization can be implemented in a reasonable way.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

        Interestingly that topic came up on perl-porters a while ago and the conclusion is that it would be possible to make standalone iterator objects for hashes in several ways, but with different costs for different modes of failure. For instance, do you throw an error if the hash gets modified during iteration? or just warn and resume the iteration as best as possible? or not even warn? or (for a performance hit) implement a fully intuitive handling of deletions and insertions?

        Several ways are possible right now using XS, without changing any core data structures. But, using the iterator would involve an XS function call, and be less performant than native iteration.

      I believe LanX's performance concern is whether a hash with 100,000 elements must extend the perl stack to 200,000 elements before beginning the loop.

      Extending the stack by one element or by 200,000 has the same fixed performance cost.

      It's cheaper in fact, because it avoids 99,999 checks to see if the stack needs to be extended.

      Also I would not say that "it uses the built-in iterator" unless it picks up from the current position of the internal iterator

      By your logic, keys and values don't use the iterator.

      my %h = ( a => 5, b => 6, c => 7 ); each %h; say for keys( %h );
      b c a

      That's wrong. %h in list context, keys and values all use the iterator. They just reset it first. You can see this by the fact that it's not left in its previous state.

      my %h = ( a => 5, b => 6, c => 7 ); each %h; while ( my $k = each( %h ) ) { say $k; } say "--"; each %h; my @a = %h; while ( my $k = each( %h ) ) { say $k; }

      If the iterator wasn't used, you'd see the same output twice. But because the iterator is used, you see something like

      c a -- b c a

      In fact, keys and values are specifically documented to reset the iterator, and they're specifically document to do this and nothing else in void context. (I don't know if %h in list context is specifcally documented to reset the iterator.)

        It's fine to reset the iterator even if they don't use it, so that's not proof. But, in fact you're right; in Perl_hv_pushkv there is

        EXTEND_MORTAL(nkeys); EXTEND(SP, ext); while ((entry = hv_iternext(hv))) { if (flags & 1) { SV *keysv = newSVhek(HeKEY_hek(entry)); SvTEMP_on(keysv); PL_tmps_stack[++PL_tmps_ix] = keysv; PUSHs(keysv); } if (flags & 2) PUSHs(HeVAL(entry)); }

        I'm rather surprised they didn't optimize that to just iterate the hashtable directly. I'm sure there are some special cases handled by hv_iternext, but they could have just tested for that first.

        Extending the stack by one element or by 200,000 has the same fixed performance cost.

        How do you figure? It has to allocate the current size + 1.6MB of ram and initialize all of it. (and as the code above shows, it also has to grow the temps stack by 800KB and initialize that) Overwriting a pair of scalars in a loop would extend by 2 and probably fall within the size of the existing stack(s) anyway.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11156172]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-20 01:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found