in reply to An iterator for (not "iterating") a recursive data structure.

What about something like (untested, assumption that false, and not undef is a flag for end of data, and that the tree is well-formed (no hashref, for example)):

sub genIterator { my $tree = shift; my @stack = ( [ 0, $tree ], 0 ); return sub { my $work = shift( @stack ); while ( ref( $work ) ) { my $index = $work->[0]; my $worktree = $work->[1]; if ( $index >= scalar( @{$worktree} ) ) { # Update 3 $work = shift( @stack ); next; } elsif ( ! ref( $worktree->[$index] ) ) { unshift( @stack, [ $index+1, $worktree ] ); return $worktree->[$index]; } else { unshift( @stack, [ $index+1, $worktree ] ); $work = [ 0, $worktree->[$index] ]; next; } } return; # empty stack } }

Test Code:

my @data = [ [ 0 ], 1, 2, [ 3, 4, 5 ], 6, [ 7, 8, [ 9, 10 ], [ 11, [ 1 +2 ], 13 ], ], 14 ]; my $it = genIterator( \@data ); use DDP; p @data; while (defined( my $x = $it->() )) { say $x; } __END__ [ [0] [ [0] [ [0] 0 ], [1] 1, [2] 2, [3] [ [0] 3, [1] 4, [2] 5 ], [4] 6, [5] [ [0] 7, [1] 8, [2] [ [0] 9, [1] 10 ], [3] [ [0] 11, [1] [ [0] 12 ], [2] 13 ] ], [6] 14 ] ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Update: No longer need to assume that '0' is an invalid element, but the use of the iterator needs to be adjusted.

Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive.

UPdate 3: changed '>' to '>=' after testing, added test code.

--MidLifeXis

Replies are listed 'Best First'.
Re^2: An iterator for (not "iterating") a recursive data structure. (terser)
by tye (Sage) on Feb 12, 2015 at 14:39 UTC

    Note that you can drop the ", 0", the "ref" and both of the "next;"s.

    Since you just need pairs, you can even drop the "["s and "]"s, reducing the code to:

    sub genIterator { my( $root ) = @_; my @stack = ( $root, 0 ); return sub { while( @stack ) { my $index = pop @stack; my $tree = pop @stack; if( ref( $tree->[$index] ) ) { push( @stack, $tree, $index+1, $tree->[$index], 0 ); } elsif( $index < @$tree ) { push( @stack, $tree, $index+1 ); return $tree->[$index]; } } return; # empty stack } }

    (Not some fundamental change, just a minor reworking.)

    - tye        

      We have a winner! Thanks to you and MidLifeXis.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      It's identical to what you'd use for a breadth-first search (as an iterator or otherwise) except then you'd use a queue instead of a stack for a breadth-first search. In both cases, you need to save the search state in a local var rather than using recursion to save it on the system stack.
      I'm wondering if this could also be done with nested hashes.

      A stored array index has a natural successor, but a hash key hasn't.°

      Using each is accessing an internal iterator "cursor", but can't be used multiple times with the same hash.

      The only solution which comes to mind is to initially cache the keys into an array and to go by index. But this static copy defies the idea of a dynamic iterator.

      Any better idea?

      update

      Tie::Hash requires us to implement a NEXTKEY this, lastkey method. But there doesn't seem to be a way to access this API with a native hash.

      update

      Is (keys %hash)[$idx++] optimized to be performant, or will it create a long list of keys to be thrown away?

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      °) well there is an internal order but this is a) kept secret and b) shuffled with every new run of perl

        UPDATE: Please don't trust the benchmark, turns out, that when operating on the same hash they are sabotaging each other*

        > Is (keys %hash)[$idx++] optimized to be performant, or will it create a long list of keys to be thrown away?

        it's way faster than each !?!

        UPDATE Sorry I misread the results, it's late. :)

        The approach with (keys %hash)[$idx++] is far slower.

        Compilation started at Fri Jun 19 02:17:48 C:/Perl_524/bin\perl.exe d:/exp/hash_iter.pl *** Testing gen_idx_keys ok 1 - 'gen_idx_keys' iterates over all entries ok 2 - 'gen_idx_keys' cycles and iterates again *** Testing gen_copy_hash_and_each ok 3 - 'gen_copy_hash_and_each' iterates over all entries ok 4 - 'gen_copy_hash_and_each' cycles and iterates again *** Testing gen_copy_keys ok 5 - 'gen_copy_keys' iterates over all entries ok 6 - 'gen_copy_keys' cycles and iterates again *** Testing gen_each ok 7 - 'gen_each' iterates over all entries ok 8 - 'gen_each' cycles and iterates again 1..8 Rate idx_keys copy_keys each copy_has +h_and_each idx_keys 3.05/s -- -100% -100% + -100% copy_keys 2361328/s 77510489% -- -6% + -23% each 2522717/s 82808095% 7% -- + -17% copy_hash_and_each 3047966/s 100049391% 29% 21% + -- Compilation finished at Fri Jun 19 02:18:05

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re^2: An iterator for (not "iterating") a recursive data structure.
by Eily (Monsignor) on Feb 12, 2015 at 14:18 UTC

    Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive.
    Yup, and my solution generates a lots of closures on the fly. This would probably make it slower on structures that are deeper than they are wide.

    Your solution doesn't seem to work though, it goes down the levels alright, but never climbs back up. I'm afraid trying to work on this in an iterative way (I kind of see the idea, and I know it's possible) just confuses the perl out of me, so I can't point out what's wrong.

      See update three - had a bad comparitor that was generating undef in the middle of the data. I was going one beyond the end of the array of data.

      --MidLifeXis

Re^2: An iterator for (not "iterating") a recursive data structure.
by BrowserUk (Patriarch) on Feb 12, 2015 at 15:31 UTC

    This is the one I'm using, though I've gone for tye's terser version. Many thanks.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re^2: An iterator for (not "iterating") a recursive data structure.
by LanX (Saint) on Feb 13, 2015 at 14:25 UTC
    Some remarks/meditations:

    Solving the initial problem

    Should be noted that the initial problem of the OP (short-circuiting a call-back) is easily solved with goto

    cbIterator { goto STOP if $_ >700; print; } \@nested; STOP:

    Terminology: What's an "Iterator"?

    Than there is some terminology confusion, in my books ( IIRC including Higher Order Perl ) this "pass a call-back" approach isn't an iterator but something I'd maybe call a diver or walker ¹, which does the iteration inside out by processing a call-back.

    I.o.W for me in Perl speak:

    iterator := iterator function

    I checked Iterator on WP and of course there is still confusion, but at least someone proposed Iteratee

    Iteratee, in which, instead of the developer calling the iterator repeatedly to get new values, the iteratee is called repeatedly to process new chunks of data - an example of inversion of control

    Where your solution excels

    The real problem with iterators in Perl is how to continue them after the first stop. (The OP didn't ask this)

    Thats why Perl6 introduces gather/take and Python has "generators" with yield constructs.

    Since you're flattening the recursion into a loop with its own call @stack , this can easily be used for such a "coroutine".

    The trick is to store persisting data like @stack in the surrounding closure of the generator!

    Finally

    Hope this was of interest. :)

    Cheers Rolf

    PS: Je suis Charlie!

    ¹) I'd love to be pointed to some sources clarifying the terminology.

      Should be noted that the initial problem of the OP (short-circuiting a call-back) is easily solved with goto

      Not if you wish to have the option of resuming the iteration; which I do. (Admittedly this was not mentioned in the OP.)

      In addition, this suffers from all the same problems as throwing an exception to effect the early exit, which is (ab)using the exception mechanism for flow control.

      Another reason for wanting an iterator is the ability to have multiple, concurrent, independent iterators; which is key to my application.

      Last but not least is the avoidance of the nightmare that is Inversion of Control.

      Think about how you'd have to restructure so many of your programs -- and how hard, if not impossible that would be -- if Perl's hashes only provided callback iterators. What a mess!


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        Amen! Nicely said.

        - tye        

        > Not if you wish to have the option of resuming the iteration;

        That's why I said

        > > Thats why Perl6 introduces gather/take and Python has "generators" with yield constructs.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      If I were to have implemented this in code, I would have probably wrapped this into something like Iterator::Simple, which wraps the 'iteratee' into an object, and adds syntactic sugar that sets it up to be accessed with ->next(), <$it>, and other sugar and coercions that help to gloss over some of the details. I believe that the object created by the module would fit the definition of the iterator. I do agree that the function I provided should have probably been referred to in my test code as $cb or some such.

      --MidLifeXis

        > should have probably been referred to in my test code as $cb

        I wasn't criticizing you, the whole thread is fuzzy. :)

        edit

        Thx for pointing me to Iterator::Simple :)

        Cheers Rolf

        PS: Je suis Charlie!

Re^2: An iterator for (not "iterating") a recursive data structure.
by LanX (Saint) on Jun 18, 2020 at 20:32 UTC
    >  while (defined( my $x = $it->() )) {

    this whole defined thing to handle false values is IMHO flawed cargo cult, what if undef is a legal value of the data structure???

    the easy solution is a list assignment denoted by parens

     while ( my ( $x ) = $it->() ) {

    the loop will only terminate if the returned list is empty.

    Hence problem solved with even shorter syntax.

    (And returning multiple values becomes trivial, too)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery