telcontar has asked for the wisdom of the Perl Monks concerning the following question:

I have to deep copy a data structure, which consists of scalars, hashes, and coderefs, but have to do some processing when I get to the "coderef" part, i.e. this is not copied from the original structure but replaced with other data. There won't be any circular references, so my code doesn't have to deal with that.

I am aware that there are several CPAN modules that can deep copy data. Storable even seems to have hooks that can influence how data is being copied, but I'm not sure I understand how to use that to my advantage. Of course I could simply copy the structure with a CPAN module function, then traverse it, remove the coderefs and replace them with the data I want. Seems clumsy.

Merlyn has a nice article on Deep Copy and Recursion on his site. The following is an adaptation of code that I used shamelessly:
sub _copy { my $this = shift; if (not ref $this) { $this } elsif (ref $this eq "HASH") { +{map { $_ => _copy($this->{$_})} keys %$this} } elsif (ref $this eq "ARRAY") { [map _copy($_), @$this] } elsif (ref $this eq "CODE") { # do some processing here, the following is a # placeholder 'CODEREF' } else { Carp::croak "What's a " . ref $_ . "?" } }
This works quite well. But let us assume, for the sake of the argument, that this should be implementet iteratively, not recursively.
I came up with this code, which takes a hashref as an argument and returns a copy of a data structure consisting of (references to) hashes and scalars, and inserts some placeholder text in place of coderefs.
sub _iterative_copy { my $this = shift; my @pile = $this; my $this_new = {}; my $root = $this_new; my @pile_new = $this_new; while (defined($this = shift @pile)) { $this_new = shift @pile_new; if (ref $this eq "HASH") { for (keys %$this) { if (ref (my $v = $this->{$_})) { if (ref $v eq "HASH") { $this_new->{$_} = {} } elsif (ref $v eq "ARRAY") { $this_new->{$_} = [] } elsif (ref $v eq "CODE") { # do some processing here, # the following is a placeholder $this_new->{$_} = "CODEREF" } else { Carp::croak "What's a " . ref $v } unshift @pile, $v; unshift @pile_new, $this_new->{$_}; } else { $this_new->{$_} = $v } } } # quietly ignore coderefs, # they are dealt with in the inner loop elsif (ref $this ne "CODE") { Carp::croak "What's a " . ref $this } } $root;
It doesn't look very elegant. Can this be implemented more efficiently? I ask for enlightenment, for the purpose of understanding how to implement this sort of thing iteratively. One can easily imagine where using recursion on a very large data structure would use up a lot of memory.

Thank you.

Replies are listed 'Best First'.
Re: Iteratively deep copying a data structure
by ikegami (Patriarch) on Jun 09, 2007 at 12:08 UTC

    One can easily imagine where using recursion on a very large data structure would use up a lot of memory.

    No, I can't. How deep is your structure? 10 levels? Even if it was 100 levels deep, that's very little memory.

      You're right, it was silly of me to write that. However, if I may say so, the intent of the original post was just to see how an iterative solution could be implemented more efficiently or succinctly, as pointless as that may seem.
Re: Iteratively deep copying a data structure
by ikegami (Patriarch) on Jun 09, 2007 at 19:20 UTC

    It doesn't look very elegant.

    Checking the reference type twice is not elegant. More importantly, it doesn't work since you only handle HASH (and not ARRAY and non-refs) at the outer level.

    _iterative_copy( [ 'a' ] );

    outputs

    What's a ARRAY at script.pl line 12 main::_iterative_copy('ARRAY(0x15d5f18)') called at script.pl +line 47

    Working code:

    use strict; use warnings; use Data::Dumper qw( Dumper ); sub iterative_copy { my $root; my @src; my @dst; push @src, $_[0]; push @dst, \$root; while (@src) { my $src = shift(@src); my $dst = shift(@dst); if (not ref $src) { $$dst = $src; } elsif (ref $src eq "HASH") { $dst = ($$dst = {}); foreach my $key (keys %$src) { push @src, $src->{$key}; push @dst, \($dst->{$key}); } } elsif (ref $src eq "ARRAY") { $dst = ($$dst = []); foreach my $i (0..$#$src) { push @src, $src->[$i]; push @dst, \($dst->[$i]); } } else { Carp::croak "What's a " . ref $_ . "?" } } return $root; } { local $Data::Dumper::Indent = 0; my $d1 = [ [ 'a', 'b' ], { k1=>'v1', k2=>'v2' } ]; my $d2 = iterative_copy($d1); print(Dumper($d1), "\n"); print(Dumper($d2), "\n"); print($d1, " != ", $d2, "\n"); print($d1->[0], " != ", $d2->[0], "\n"); print($d1->[1], " != ", $d2->[1], "\n"); }

    Output:

    $VAR1 = [['a','b'],{'k2' => 'v2','k1' => 'v1'}]; $VAR1 = [['a','b'],{'k2' => 'v2','k1' => 'v1'}]; ARRAY(0x1ae0718) != ARRAY(0x1ae073c) ARRAY(0x15d5fd8) != ARRAY(0x1ae076c) HASH(0x15d6c98) != HASH(0x1ae0790)

    Below is a more elegant version. It uses one array (@todo) instead of two (@src, @dst). However, it uses more memory.

      I didn't really need arrayrefs, so I left them out of the iterative implementation.

      Yes, that looks much better. Thank you for your help! :-)