An iterator for (not "iterating") a recursive data structure.

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: An iterator for (not "iterating") a recursive data structure. by MidLifeXis (Monsignor) on Feb 12, 2015 at 14:00 UTC
What about something like (untested, ~~assumption that false, and not undef is a flag for end of data, and that~~ the tree is well-formed (no hashref, for example)): sub genIterator { my $tree = shift; my @stack = ( [ 0, $tree ], 0 ); return sub { my $work = shift( @stack ); while ( ref( $work ) ) { my $index = $work->[0]; my $worktree = $work->[1]; if ( $index >= scalar( @{$worktree} ) ) { # Update 3 $work = shift( @stack ); next; } elsif ( ! ref( $worktree->[$index] ) ) { unshift( @stack, [ $index+1, $worktree ] ); return $worktree->[$index]; } else { unshift( @stack, [ $index+1, $worktree ] ); $work = [ 0, $worktree->[$index] ]; next; } } return; # empty stack } } [download] Test Code: `my @data = [ [ 0 ], 1, 2, [ 3, 4, 5 ], 6, [ 7, 8, [ 9, 10 ], [ 11, [ 1 +2 ], 13 ], ], 14 ]; my $it = genIterator( \@data ); use DDP; p @data; while (defined( my $x = $it->() )) { say $x; } __END__ [ [0] [ [0] [ [0] 0 ], [1] 1, [2] 2, [3] [ [0] 3, [1] 4, [2] 5 ], [4] 6, [5] [ [0] 7, [1] 8, [2] [ [0] 9, [1] 10 ], [3] [ [0] 11, [1] [ [0] 12 ], [2] 13 ] ], [6] 14 ] ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14` [download] Update: No longer need to assume that '0' is an invalid element, but the use of the iterator needs to be adjusted. Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive. UPdate 3: changed '>' to '>=' after testing, added test code. --MidLifeXis	[reply] [d/l] [select]
Re^2: An iterator for (not "iterating") a recursive data structure. (terser) by tye (Sage) on Feb 12, 2015 at 14:39 UTC
Note that you can drop the ", 0", the "ref" and both of the "next;"s. Since you just need pairs, you can even drop the "["s and "]"s, reducing the code to: `sub genIterator { my( $root ) = @_; my @stack = ( $root, 0 ); return sub { while( @stack ) { my $index = pop @stack; my $tree = pop @stack; if( ref( $tree->[$index] ) ) { push( @stack, $tree, $index+1, $tree->[$index], 0 ); } elsif( $index < @$tree ) { push( @stack, $tree, $index+1 ); return $tree->[$index]; } } return; # empty stack } }` [download] (Not some fundamental change, just a minor reworking.) - tye	[reply] [d/l]
Re^3: An iterator for (not "iterating") a recursive data structure. (Winner!) by BrowserUk (Patriarch) on Feb 12, 2015 at 15:01 UTC
We have a winner! Thanks to you and MidLifeXis. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^3: An iterator for (not "iterating") a recursive data structure. (terser) by ikegami (Patriarch) on Feb 13, 2015 at 15:30 UTC
It's identical to what you'd use for a breadth-first search (as an iterator or otherwise) except then you'd use a queue instead of a stack for a breadth-first search. In both cases, you need to save the search state in a local var rather than using recursion to save it on the system stack.	[reply]
Re^3: An iterator for (not "iterating") a recursive data structure. (index into keys?) by LanX (Saint) on Jun 18, 2020 at 20:48 UTC
I'm wondering if this could also be done with nested hashes. A stored array index has a natural successor, but a hash key hasn't.° Using `each` is accessing an internal iterator "cursor", but can't be used multiple times with the same hash. The only solution which comes to mind is to initially cache the `keys` into an array and to go by index. But this static copy defies the idea of a dynamic iterator. Any better idea? update Tie::Hash requires us to implement a `NEXTKEY this, lastkey` method. But there doesn't seem to be a way to access this API with a native hash. update Is `(keys %hash)[$idx++]` optimized to be performant, or will it create a long list of keys to be thrown away? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) well there is an internal order but this is a) kept secret and b) shuffled with every new run of perl	[reply] [d/l] [select]
Re^4: An iterator for (not "iterating") a recursive data structure. (updated x 2) by LanX (Saint) on Jun 19, 2020 at 00:32 UTC
Re^2: An iterator for (not "iterating") a recursive data structure. by Eily (Monsignor) on Feb 12, 2015 at 14:18 UTC
Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive. Yup, and my solution generates a lots of closures on the fly. This would probably make it slower on structures that are deeper than they are wide. Your solution doesn't seem to work though, it goes down the levels alright, but never climbs back up. I'm afraid trying to work on this in an iterative way (I kind of see the idea, and I know it's possible) just confuses the perl out of me, so I can't point out what's wrong.	[reply]
Re^3: An iterator for (not "iterating") a recursive data structure. by MidLifeXis (Monsignor) on Feb 12, 2015 at 14:25 UTC
See update three - had a bad comparitor that was generating undef in the middle of the data. I was going one beyond the end of the array of data. --MidLifeXis	[reply]
Re^2: An iterator for (not "iterating") a recursive data structure. by BrowserUk (Patriarch) on Feb 12, 2015 at 15:31 UTC
This is the one I'm using, though I've gone for tye's terser version. Many thanks. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^2: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Feb 13, 2015 at 14:25 UTC
Some remarks/meditations: Solving the initial problem Should be noted that the initial problem of the OP (short-circuiting a call-back) is easily solved with `goto` `cbIterator { goto STOP if $_ >700; print; } \@nested; STOP:` [download] <Reveal this spoiler or all in this thread> Terminology: What's an "Iterator"? Than there is some terminology confusion, in my books ( IIRC including Higher Order Perl ) this "pass a call-back" approach isn't an iterator but something I'd maybe call a diver or walker ¹, which does the iteration inside out by processing a call-back. I.o.W for me in Perl speak: iterator := iterator function I checked Iterator on WP and of course there is still confusion, but at least someone proposed Iteratee Iteratee, in which, instead of the developer calling the iterator repeatedly to get new values, the iteratee is called repeatedly to process new chunks of data - an example of inversion of control Where your solution excels The real problem with iterators in Perl is how to continue them after the first stop. (The OP didn't ask this) Thats why Perl6 introduces gather/take and Python has "generators" with yield constructs. Since you're flattening the recursion into a loop with its own call `@stack` , this can easily be used for such a "coroutine". The trick is to store persisting data like `@stack` in the surrounding closure of the generator! Finally Hope this was of interest. :) Cheers Rolf PS: Je suis Charlie! ¹) I'd love to be pointed to some sources clarifying the terminology.	[reply] [d/l] [select]
Re^3: An iterator for (not "iterating") a recursive data structure. by BrowserUk (Patriarch) on Feb 13, 2015 at 15:29 UTC
Should be noted that the initial problem of the OP (short-circuiting a call-back) is easily solved with goto Not if you wish to have the option of resuming the iteration; which I do. (Admittedly this was not mentioned in the OP.) In addition, this suffers from all the same problems as throwing an exception to effect the early exit, which is (ab)using the exception mechanism for flow control. Another reason for wanting an iterator is the ability to have multiple, concurrent, independent iterators; which is key to my application. Last but not least is the avoidance of the nightmare that is Inversion of Control. Think about how you'd have to restructure so many of your programs -- and how hard, if not impossible that would be -- if Perl's hashes only provided callback iterators. What a mess! With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^4: An iterator for (not "iterating") a recursive data structure. (callbacks--) by tye (Sage) on Feb 13, 2015 at 19:50 UTC
Re^4: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Jul 08, 2019 at 18:17 UTC
Re^3: An iterator for (not "iterating") a recursive data structure. by MidLifeXis (Monsignor) on Feb 13, 2015 at 14:38 UTC
If I were to have implemented this in code, I would have probably wrapped this into something like Iterator::Simple, which wraps the 'iteratee' into an object, and adds syntactic sugar that sets it up to be accessed with `->next()`, `<$it>`, and other sugar and coercions that help to gloss over some of the details. I believe that the object created by the module would fit the definition of the iterator. I do agree that the function I provided should have probably been referred to in my test code as `$cb` or some such. --MidLifeXis	[reply] [d/l] [select]
Re^4: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Feb 13, 2015 at 14:43 UTC
Re^5: An iterator for (not "iterating") a recursive data structure. by MidLifeXis (Monsignor) on Feb 13, 2015 at 14:49 UTC
Re^2: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Jun 18, 2020 at 20:32 UTC
> `while (defined( my $x = $it->() )) {` this whole `defined` thing to handle false values is IMHO flawed cargo cult, what if `undef` is a legal value of the data structure??? the easy solution is a list assignment denoted by parens `while ( my ( $x ) = $it->() ) {` the loop will only terminate if the returned list is empty. Hence problem solved with even shorter syntax. (And returning multiple values becomes trivial, too) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: An iterator for (not "iterating") a recursive data structure. by Eily (Monsignor) on Feb 12, 2015 at 13:43 UTC
This is pretty rough, and only works if undef is not a possible value, and empty arrays are not possible. But you should have no problem to adapt it: #! perl use v5.14;; use Data::Dump qw[ pp ]; sub genNested { my $n = shift; map{ rand() < 0.5 ? [ rand(1000), genNested( $n-1 ) ] : rand( 1000 + ) } 1 .. $n; } my @nested = genNested 3; pp \@nested; my @flat = 1..10; sub genIterator(+) { my $array = shift; my $index = 0; # Current index in the list my $child; # Iterator for the level below if the current element is + an array return sub { my $element; if (defined $child) { $element = $child->(); } if (!defined $element) { $child = undef; $element = $index < @$array ? $array->[$index++] : undef; if (ref $element eq 'ARRAY') { $child = genIterator($element); $element = $child->(); } } return $element; } } my $next = genIterator @nested; say while $_ = $next->(); [download]	[reply] [d/l]
Re^2: An iterator for (not "iterating") a recursive data structure. by ikegami (Patriarch) on Jul 09, 2019 at 04:29 UTC
This [...] only works if undef is not a possible value, and empty arrays are not possible. It doesn't even handle false values (without a minor change). The following handles false values (including undef) as well as empty arrays: sub genIterator { my $array = shift; my $index = 0; # Current index into $array my $child_iter; # Iterator for array at $array[$index] return sub { while (1) { # If we're iterating through descendants of a child, if (defined $child_iter) { # If the there is a descendant we haven't returned, if ( my ($element) = $child->() ) { return $element; } # No more descendants for this child. $child_iter = undef; } # Signal end of iteration if done. return if $index >= @$array; # Return next child if it's not an array ref. my $element = $array->[$index++]; return $element if ref $element ne 'ARRAY'; # Prepare to iterate through descendants of child. $child_iter = genIterator($element); } } } my $iter = genIterator \@nested; say while ($_) = $iter->(); [download]	[reply] [d/l]
Re^3: An iterator for (not "iterating") a recursive data structure. by Eily (Monsignor) on Jul 09, 2019 at 07:08 UTC
++. It looks like I forgot to test my code for false values (it took me a while to see where they wouldn't work, it was the very last line where I iterate while the value is true). Thanks for the correction, for some reason I'm still not sure I would have thought of it even nowadays. I suppose I don't really think about using the number of returned values when I'm expecting a scalar, rather than a list	[reply]
Re^4: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Jul 09, 2019 at 08:46 UTC
Re^5: An iterator for (not "iterating") a recursive data structure. by Eily (Monsignor) on Jul 09, 2019 at 09:29 UTC
Some notes below your chosen depth have not been shown here
Re^2: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Feb 13, 2015 at 13:45 UTC
> and only works if undef is not a possible value If you are suffering from the undef is a value problem¹, than call iterators in list context. Re: list context (iterators). I have to admit that I didn't try to understand your code , though! :) Cheers Rolf PS: Je suis Charlie! ¹) aka "0 but true" nonsense	[reply]
Re^3: An iterator for (not "iterating") a recursive data structure. by Eily (Monsignor) on Feb 13, 2015 at 15:00 UTC
Though it kind of bothered me at first, now I really like the way python handles the semipredicate problem, which is to throw an exception at the end of the iteration rather than try to return a specific value (which may end up as part of an iterable). Besides, I think that python's yield would have been one of the easiest ways to solve this problem. For those who don't know about yield it stops the execution of a function (it can be in the middle of an infinite loop) which will be resumed when the next value is fetched. Perl 6's lazy lists can be used in a similar way according to its faq As for understanding my code, the concept is actually fairly simple (well, to me it is, but I'm used to working with closures). I started with a flat list iterator: a closure that holds the array and the current index. In the nested list version, each flat list iterator has a children which is an iterator for the current element if the element itself is an array. You could see it as equivalent to this: `def next: if array[index].isAnArray() and array[index].hasNext() ret = array[index].next(); else ret = array[index] endif index++ end` [download] . This means that the calling stack for a deeply nested list will be something like: `parent.next() child.next() grandChild.next() ...` [download] And so each time a value is fetched, which is why an iterative solution like MidLifeXis's can be far more effective.	[reply] [d/l] [select]
Run-time warning, error and delayed exception handling in Perl 6 (was Re^4: An iterator for (not "iterating") a recursive data structure.) by raiph (Deacon) on Feb 14, 2015 at 22:36 UTC
Re^4: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Feb 13, 2015 at 15:03 UTC
Re^2: An iterator for (not "iterating") a recursive data structure. by BrowserUk (Patriarch) on Feb 12, 2015 at 15:29 UTC
This is akin to the route I was going, but I just couldn't get it right. Thanks. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re: An iterator for (not "iterating") a recursive data structure. by salva (Canon) on Feb 12, 2015 at 14:14 UTC
A stack of indexes? In order to avoid browsing the data structure from the root every time, you may want to keep a stack of pairs (reference, index) instead.	[reply]
Re: An iterator for (not "iterating") a recursive data structure. by karlgoethebier (Abbot) on Feb 12, 2015 at 15:27 UTC
"...AoA[oA[oA]...]]...iterate over the leaves...quit the iteration at an arbitrary point" Perhaps Data::Rmap does the job? `use strict; use warnings; use Data::Rmap qw(rmap); my @data = [ [0], 1, 2, [ 3, 4, 5 ], 6, [ 7, 8, [ 9, 10 ], [ 11, [12] +, 13 ], ], 14 ]; my $something = 5; eval { rmap { $_ <= $something ? print : die; } @data; }; __END__ monks>rmap.pl 0 1 2 3 4 5` [download] Edit: Forgot to copy&paste something ;-) ...and fixed some entities Best regards, Karl «The Crux of the Biscuit is the Apostrophe»	[reply] [d/l]
Re: An iterator for (not "iterating") a recursive data structure. by Anonymous Monk on Feb 12, 2015 at 14:05 UTC
Which is fine unless you need to be able to quit the iteration at an arbitrary point. (ie. short circuit rather completing the full iteration). Well in functional languages a simple way to do it would be to throw an exception when want to exit recursion (maybe not in weird languages like Haskell). So maybe consider just dying inside the callback when you're done with it, and catch with eval.	[reply]
Re: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Jul 13, 2019 at 18:49 UTC
I found all solutions so far overly complicated, what am I missing? please note that the maximal length of the queue is only the sum of children at the current descend path, not all tree elements. (e.g. a binary tree with 10 levels has 2*10 = 2014 nodes but the queue will never exceed 20 = 102 elements) Changing from `unshift` to `push` will perform a breadth first search. use strict; use warnings; use Data::Dump qw/pp dd/; print "\n---------- Input\n"; sub genNested { my $n = shift; map{ rand() < 0.5 ? [ int rand(20), genNested( $n-1 ) ] : int rand +( 20 ) } 1 ..$n; } my @nested = genNested( 3 ); print pp \@nested; print "\n\n---------- my solution\n"; sub genIterator { my $tree = shift; my @queue = @{$tree}; return sub { while ( my ($node) = splice @queue,0,1 ) { # shift is buggy if ( ref $node eq 'ARRAY' ) { unshift @queue, @{$node}; next; } else { return $node } } return; }; } my $iter = genIterator( \@nested ); while ( my ($next) = $iter->() ) { print "$next,\t"; } print "\n\n---------- BUKs goal\n"; sub cbIterator (&$) { my( $code, $tree ) = @_; for my $i ( 0 .. $#$tree ) { if( ref( $tree->[ $i ] ) eq 'ARRAY' ) { &cbIterator( $code, $tree->[ $i ] ); } else { local $_ = $tree->[ $i ]; $code->(); } } } cbIterator { print "$_,\t"; } \@nested; [download] `---------- Input [ [934, [993, 105], [167, [213]]], [71, [120, [973]], 135], [404, [828, [571]], 945], ] ---------- my solution 934, 993, 105, 167, 213, 71, 120, 973, 135, + 404, 828, 571, 945, ---------- BUKs goal 934, 993, 105, 167, 213, 71, 120, 973, 135, + 404, 828, 571, 945,` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice} update fixed issue with strange shift behaviour. update changed from shift to splice to handle bug with false nodes	[reply] [d/l] [select]
Re: An iterator for (not "iterating") a recursive data structure. by Anonymous Monk on Feb 14, 2015 at 22:51 UTC
There is the simple "flatten as you go" approach: `sub genIterator { my @stack = @_; return sub { splice @stack, 0, 1, @{$stack[0]} while 'ARRAY' eq ref $stack[0]; shift @stack; } }` [download]	[reply] [d/l]
Re^2: An iterator for (not "iterating") a recursive data structure. by BrowserUk (Patriarch) on Feb 14, 2015 at 23:02 UTC
S'fine for small AoAoA..s, but the top level array in my application will contain upto 200 million elements -- circa. 7GB -- duplicating that on a stack would blow my memory. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^3: An iterator for (not "iterating") a recursive data structure. by LanX (Saint) on Jul 13, 2019 at 20:13 UTC
> duplicating that on a stack would blow my memory. It's not duplicating the memory, just caching the current path from root to leave, very similar to my solution here Worst case scenario with a balanced two level "tree" would mean approx `@stack == 2*sqrt(@nodes)` . For 200 million elements that's about 28000 elements. If that's still to many store a lazy iterator in the @stack returning the children of one node (like a function ref or an object or a tied array) and process it in the iterator. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l]

update

update

Solving the initial problem

Terminology: What's an "Iterator"?

Where your solution excels

Finally

update

update