Re: An iterator for (not "iterating") a recursive data structure.
by MidLifeXis (Monsignor) on Feb 12, 2015 at 14:00 UTC
|
What about something like (untested, assumption that false, and not undef is a flag for end of data, and that the tree is well-formed (no hashref, for example)):
sub genIterator {
my $tree = shift;
my @stack = ( [ 0, $tree ], 0 );
return sub {
my $work = shift( @stack );
while ( ref( $work ) ) {
my $index = $work->[0];
my $worktree = $work->[1];
if ( $index >= scalar( @{$worktree} ) ) { # Update 3
$work = shift( @stack );
next;
}
elsif ( ! ref( $worktree->[$index] ) ) {
unshift( @stack, [ $index+1, $worktree ] );
return $worktree->[$index];
}
else {
unshift( @stack, [ $index+1, $worktree ] );
$work = [ 0, $worktree->[$index] ];
next;
}
}
return; # empty stack
}
}
Test Code:
my @data = [ [ 0 ], 1, 2, [ 3, 4, 5 ], 6, [ 7, 8, [ 9, 10 ], [ 11, [ 1
+2 ], 13 ], ], 14 ];
my $it = genIterator( \@data );
use DDP;
p @data;
while (defined( my $x = $it->() )) {
say $x;
}
__END__
[
[0] [
[0] [
[0] 0
],
[1] 1,
[2] 2,
[3] [
[0] 3,
[1] 4,
[2] 5
],
[4] 6,
[5] [
[0] 7,
[1] 8,
[2] [
[0] 9,
[1] 10
],
[3] [
[0] 11,
[1] [
[0] 12
],
[2] 13
]
],
[6] 14
]
]
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Update: No longer need to assume that '0' is an invalid element, but the use of the iterator needs to be adjusted.
Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive.
UPdate 3: changed '>' to '>=' after testing, added test code.
| [reply] [d/l] [select] |
|
|
Note that you can drop the ", 0", the "ref" and both of the "next;"s.
Since you just need pairs, you can even drop the "["s and "]"s, reducing the code to:
sub genIterator {
my( $root ) = @_;
my @stack = ( $root, 0 );
return sub {
while( @stack ) {
my $index = pop @stack;
my $tree = pop @stack;
if( ref( $tree->[$index] ) ) {
push( @stack, $tree, $index+1, $tree->[$index], 0 );
} elsif( $index < @$tree ) {
push( @stack, $tree, $index+1 );
return $tree->[$index];
}
}
return; # empty stack
}
}
(Not some fundamental change, just a minor reworking.)
| [reply] [d/l] |
|
|
| [reply] |
|
|
It's identical to what you'd use for a breadth-first search (as an iterator or otherwise) except then you'd use a queue instead of a stack for a breadth-first search. In both cases, you need to save the search state in a local var rather than using recursion to save it on the system stack.
| [reply] |
|
|
I'm wondering if this could also be done with nested hashes.
A stored array index has a natural successor, but a hash key hasn't.°
Using each is accessing an internal iterator "cursor", but can't be used multiple times with the same hash.
The only solution which comes to mind is to initially cache the keys into an array and to go by index. But this static copy defies the idea of a dynamic iterator.
Any better idea?
update
Tie::Hash
requires us to implement a NEXTKEY this, lastkey method. But there doesn't seem to be a way to access this API with a native hash.
update
Is (keys %hash)[$idx++] optimized to be performant, or will it create a long list of keys to be thrown away?
°) well there is an internal order but this is a) kept secret and b) shuffled with every new run of perl
| [reply] [d/l] [select] |
|
|
|
|
Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive.
Yup, and my solution generates a lots of closures on the fly. This would probably make it slower on structures that are deeper than they are wide.
Your solution doesn't seem to work though, it goes down the levels alright, but never climbs back up. I'm afraid trying to work on this in an iterative way (I kind of see the idea, and I know it's possible) just confuses the perl out of me, so I can't point out what's wrong.
| [reply] |
|
|
| [reply] |
|
|
| [reply] |
|
|
cbIterator {
goto STOP if $_ >700;
print;
} \@nested;
STOP:
Terminology: What's an "Iterator"?
Than there is some terminology confusion, in my books ( IIRC including Higher Order Perl ) this "pass a call-back" approach isn't an iterator but something I'd maybe call a diver or walker ¹, which does the iteration inside out by processing a call-back.
I.o.W for me in Perl speak:
iterator := iterator function
I checked Iterator on WP and of course there is still confusion, but at least someone proposed Iteratee
Iteratee, in which, instead of the developer calling the iterator repeatedly to get new values, the iteratee is called repeatedly to process new chunks of data - an example of inversion of control
Where your solution excels
The real problem with iterators in Perl is how to continue them after the first stop. (The OP didn't ask this)
Thats why Perl6 introduces gather/take and Python has "generators" with yield constructs.
Since you're flattening the recursion into a loop with its own call @stack , this can easily be used for such a "coroutine".
The trick is to store persisting data like @stack in the surrounding closure of the generator!
Finally
Hope this was of interest. :)
¹) I'd love to be pointed to some sources clarifying the terminology. | [reply] [d/l] [select] |
|
|
Should be noted that the initial problem of the OP (short-circuiting a call-back) is easily solved with goto
Not if you wish to have the option of resuming the iteration; which I do. (Admittedly this was not mentioned in the OP.)
In addition, this suffers from all the same problems as throwing an exception to effect the early exit, which is (ab)using the exception mechanism for flow control.
Another reason for wanting an iterator is the ability to have multiple, concurrent, independent iterators; which is key to my application.
Last but not least is the avoidance of the nightmare that is Inversion of Control.
Think about how you'd have to restructure so many of your programs -- and how hard, if not impossible that would be -- if Perl's hashes only provided callback iterators. What a mess!
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] |
|
|
|
|
|
|
If I were to have implemented this in code, I would have probably wrapped this into something like Iterator::Simple, which wraps the 'iteratee' into an object, and adds syntactic sugar that sets it up to be accessed with ->next(), <$it>, and other sugar and coercions that help to gloss over some of the details. I believe that the object created by the module would fit the definition of the iterator. I do agree that the function I provided should have probably been referred to in my test code as $cb or some such.
| [reply] [d/l] [select] |
|
|
|
|
|
|
>
while (defined( my $x = $it->() )) {
this whole defined thing to handle false values is IMHO flawed cargo cult, what if undef is a legal value of the data structure???
the easy solution is a list assignment denoted by parens
while ( my ( $x ) = $it->() ) {
the loop will only terminate if the returned list is empty.
Hence problem solved with even shorter syntax.
(And returning multiple values becomes trivial, too)
| [reply] [d/l] [select] |
Re: An iterator for (not "iterating") a recursive data structure.
by Eily (Monsignor) on Feb 12, 2015 at 13:43 UTC
|
This is pretty rough, and only works if undef is not a possible value, and empty arrays are not possible. But you should have no problem to adapt it:
#! perl
use v5.14;;
use Data::Dump qw[ pp ];
sub genNested {
my $n = shift;
map{ rand() < 0.5 ? [ rand(1000), genNested( $n-1 ) ] : rand( 1000
+ ) } 1 .. $n;
}
my @nested = genNested 3;
pp \@nested;
my @flat = 1..10;
sub genIterator(+)
{
my $array = shift;
my $index = 0; # Current index in the list
my $child; # Iterator for the level below if the current element is
+ an array
return sub {
my $element;
if (defined $child)
{
$element = $child->();
}
if (!defined $element)
{
$child = undef;
$element = $index < @$array ? $array->[$index++] : undef;
if (ref $element eq 'ARRAY')
{
$child = genIterator($element);
$element = $child->();
}
}
return $element;
}
}
my $next = genIterator @nested;
say while $_ = $next->();
| [reply] [d/l] |
|
|
sub genIterator {
my $array = shift;
my $index = 0; # Current index into $array
my $child_iter; # Iterator for array at $array[$index]
return sub {
while (1) {
# If we're iterating through descendants of a child,
if (defined $child_iter) {
# If the there is a descendant we haven't returned,
if ( my ($element) = $child->() ) {
return $element;
}
# No more descendants for this child.
$child_iter = undef;
}
# Signal end of iteration if done.
return if $index >= @$array;
# Return next child if it's not an array ref.
my $element = $array->[$index++];
return $element if ref $element ne 'ARRAY';
# Prepare to iterate through descendants of child.
$child_iter = genIterator($element);
}
}
}
my $iter = genIterator \@nested;
say while ($_) = $iter->();
| [reply] [d/l] |
|
|
++. It looks like I forgot to test my code for false values (it took me a while to see where they wouldn't work, it was the very last line where I iterate while the value is true).
Thanks for the correction, for some reason I'm still not sure I would have thought of it even nowadays. I suppose I don't really think about using the number of returned values when I'm expecting a scalar, rather than a list
| [reply] |
|
|
|
|
|
|
|
> and only works if undef is not a possible value
If you are suffering from the undef is a value problem¹, than call iterators in list context.
Re: list context (iterators).
I have to admit that I didn't try to understand your code , though! :)
¹) aka "0 but true" nonsense
| [reply] |
|
|
Though it kind of bothered me at first, now I really like the way python handles the semipredicate problem, which is to throw an exception at the end of the iteration rather than try to return a specific value (which may end up as part of an iterable). Besides, I think that python's yield would have been one of the easiest ways to solve this problem. For those who don't know about yield it stops the execution of a function (it can be in the middle of an infinite loop) which will be resumed when the next value is fetched. Perl 6's lazy lists can be used in a similar way according to its faq
As for understanding my code, the concept is actually fairly simple (well, to me it is, but I'm used to working with closures). I started with a flat list iterator: a closure that holds the array and the current index. In the nested list version, each flat list iterator has a children which is an iterator for the current element if the element itself is an array. You could see it as equivalent to this:
def next:
if array[index].isAnArray() and array[index].hasNext()
ret = array[index].next();
else
ret = array[index]
endif
index++
end
. This means that the calling stack for a deeply nested list will be something like:
parent.next()
child.next()
grandChild.next()
...
And so each time a value is fetched, which is why an iterative solution like MidLifeXis's can be far more effective.
| [reply] [d/l] [select] |
|
|
|
|
|
|
| [reply] |
Re: An iterator for (not "iterating") a recursive data structure.
by salva (Canon) on Feb 12, 2015 at 14:14 UTC
|
A stack of indexes?
In order to avoid browsing the data structure from the root every time, you may want to keep a stack of pairs (reference, index) instead. | [reply] |
Re: An iterator for (not "iterating") a recursive data structure.
by karlgoethebier (Abbot) on Feb 12, 2015 at 15:27 UTC
|
use strict;
use warnings;
use Data::Rmap qw(rmap);
my @data = [ [0], 1, 2, [ 3, 4, 5 ], 6, [ 7, 8, [ 9, 10 ], [ 11, [12]
+, 13 ], ], 14 ];
my $something = 5;
eval {
rmap { $_ <= $something ? print : die; } @data;
};
__END__
monks>rmap.pl
0
1
2
3
4
5
Edit: Forgot to copy&paste something ;-) ...and fixed some entities
Best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] |
Re: An iterator for (not "iterating") a recursive data structure.
by Anonymous Monk on Feb 12, 2015 at 14:05 UTC
|
Which is fine unless you need to be able to quit the iteration at an arbitrary point. (ie. short circuit rather completing the full iteration).
Well in functional languages a simple way to do it would be to throw an exception when want to exit recursion (maybe not in weird languages like Haskell). So maybe consider just dying inside the callback when you're done with it, and catch with eval.
| [reply] |
Re: An iterator for (not "iterating") a recursive data structure.
by LanX (Saint) on Jul 13, 2019 at 18:49 UTC
|
I found all solutions so far overly complicated, what am I missing?
please note that the maximal length of the queue is only the sum of children at the current descend path, not all tree elements. (e.g. a binary tree with 10 levels has 2**10 = 2014 nodes but the queue will never exceed 20 = 10*2 elements)
Changing from unshift to push will perform a breadth first search.
use strict;
use warnings;
use Data::Dump qw/pp dd/;
print "\n---------- Input\n";
sub genNested {
my $n = shift;
map{ rand() < 0.5 ? [ int rand(20), genNested( $n-1 ) ] : int rand
+( 20 ) } 1 ..$n;
}
my @nested = genNested( 3 );
print pp \@nested;
print "\n\n---------- my solution\n";
sub genIterator {
my $tree = shift;
my @queue = @{$tree};
return sub {
while ( my ($node) = splice @queue,0,1 ) { # shift is buggy
if ( ref $node eq 'ARRAY' ) {
unshift @queue, @{$node};
next;
} else {
return $node
}
}
return;
};
}
my $iter = genIterator( \@nested );
while ( my ($next) = $iter->() ) {
print "$next,\t";
}
print "\n\n---------- BUKs goal\n";
sub cbIterator (&$) {
my( $code, $tree ) = @_;
for my $i ( 0 .. $#$tree ) {
if( ref( $tree->[ $i ] ) eq 'ARRAY' ) {
&cbIterator( $code, $tree->[ $i ] );
}
else {
local $_ = $tree->[ $i ];
$code->();
}
}
}
cbIterator {
print "$_,\t";
} \@nested;
---------- Input
[
[934, [993, 105], [167, [213]]],
[71, [120, [973]], 135],
[404, [828, [571]], 945],
]
---------- my solution
934, 993, 105, 167, 213, 71, 120, 973, 135,
+ 404, 828, 571, 945,
---------- BUKs goal
934, 993, 105, 167, 213, 71, 120, 973, 135,
+ 404, 828, 571, 945,
update
fixed issue with strange shift behaviour.
update
changed from shift to splice to handle bug with false nodes | [reply] [d/l] [select] |
Re: An iterator for (not "iterating") a recursive data structure.
by Anonymous Monk on Feb 14, 2015 at 22:51 UTC
|
sub genIterator
{
my @stack = @_;
return sub
{
splice @stack, 0, 1, @{$stack[0]} while 'ARRAY' eq ref $stack[0];
shift @stack;
}
}
| [reply] [d/l] |
|
|
| [reply] |
|
|
> duplicating that on a stack would blow my memory.
It's not duplicating the memory, just caching the current path from root to leave, very similar to my solution here
Worst case scenario with a balanced two level "tree" would mean approx @stack == 2*sqrt(@nodes) .
For 200 million elements that's about 28000 elements.
If that's still to many store a lazy iterator in the @stack returning the children of one node
(like a function ref or an object or a tied array) and process it in the iterator.
| [reply] [d/l] |