Tanalis has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm currently working on a script that basically needs to read data from one of our databases, do some fairly basic processing on it, and then output that data to a file for reporting.

The data is returned from the database in an AOH, where each hash of the array corresponds to a row returned from the table.

Once this data has been read, the script takes each hash in turn, reads the customer ID, and finds the list of open orders for that customer. It then attempts to create a copy of the original hash for each order id, and pushes each new hash onto an output array.

The problem seems to occur with the push. If I output the data hash-by-hash just prior to the push, I get perfect data, but if I dump out the final array once the loop is complete, I seem to get very corrupt data - a whole load of undef and references to $output->[1].

The loop I'm using to modify the data and push onto the new array is as follows:

my $output = []; foreach my $hash (@$data) { my $cust_id = $hash->{cust_id}; foreach my $id (split ",", $order_data->{$cust_id}->{order_ids}) { $hash->{order_id} = $id; print Dumper $hash; # outputting here yields perfect data push @$output, $hash; } } print Dumper $output; # outputting here gives trashed data
I've been trying to figure this out for ages, and I've not had any luck - I'm wondering if I have referencing problems, but I've tried modifying the push with no luck as yet.

Can anyone see where this is going wrong? Any suggestions at this point would be appreciated :)

Thanks in advance,
-- Foxcub
A friend is someone who can see straight through you, yet still enjoy the view. (Anon)

Replies are listed 'Best First'.
Re: Push corrupting data
by zengargoyle (Deacon) on Mar 13, 2003 at 08:52 UTC

    you're pushing the same hash on the list, i think you want to make a copy yes?

    my $data = [ { cust_id => 1 }, { cust_id => 3 }, ]; my $order_data = { 1 => { order_ids => '5,6' }, 3 => { order_ids => '7,8,9' }, }; my $output = []; foreach my $hash (@$data) { my $cust_id = $hash->{cust_id}; foreach my $id (split ',', $order_data->{$cust_id}->{order_ids}) { $hash->{order_id} = $id; push @$output, {%{$hash}}; } } use Data::Dumper; print Dumper $output; __END__ $VAR1 = [ { 'cust_id' => 1, 'order_id' => '5' }, { 'cust_id' => 1, 'order_id' => '6' }, { 'cust_id' => 3, 'order_id' => '7' }, { 'cust_id' => 3, 'order_id' => '8' }, { 'cust_id' => 3, 'order_id' => '9' } ];
      Thanks - that seems to have solved the problem. <PI'm kind of lost as to why a rereferenced dereference ({%{$hash}} over simply $hash) works, but pushing the original reference doesn't - would you mind trying to explain this?

      Thanks again for your help .. *smiles*
      -- Foxcub
      A friend is someone who can see straight through you, yet still enjoy the view. (Anon)

        As I know very well since yesterday, dereferencing a hashref like this actually makes a copy of it. So when you do {%{$hash}} you create a new hashref that points to a new copy of whatever data $hash pointed to. That's why it works :).

        CU
        Robartes-

Re: Push corrupting data
by robartes (Priest) on Mar 13, 2003 at 09:14 UTC
    zengargoyle nailed your problem down, but I played around a bit to reproduce your problem and have some extra information that might interest you. Specifically, the reason you get lots of references to $output->[1] in your last dump is that Data::Dumper does not do deepcopies by default: if it encounters array elements that point to the same thing, it prints a self-reference (hence $output->[1]). You can control this behaviour with $Data::Dumper::Deepcopy. E.g.
    use strict; use Data::Dumper; my $hashref= { 'one' => 'two' }; my $aryref=[ $hashref ]; push @$aryref, $hashref; push @$aryref, $hashref; print "Without Deepcopy: ".Dumper($aryref); $Data::Dumper::Deepcopy=1; print "With Deepcopy: ".Dumper($aryref); print "Without Data::Dumper:\n"; print join("\n", @$aryref); __END__ Without Deepcopy: $VAR1 = [ { 'one' => 'two' }, $VAR1->[0], $VAR1->[0] ]; With Deepcopy: $VAR1 = [ { 'one' => 'two' }, { 'one' => 'two' }, { 'one' => 'two' } ]; Without Data::Dumper: HASH(0x8111a94) HASH(0x8111a94) HASH(0x8111a94)
    So, the weird stuff you saw was actually a Data::Dumper artifact. There are actually three identical references in the array, but Data::Dumper only derefs one by default.

    Anyway, we now return you to the scheduled programme.

    CU
    Robartes-

      You can control this behaviour with $Data::Dumper::Deepcopy. E.g

      Actually $Data::Dumper::Deepcopy is probably not the best way to understand whats going on here. $Data::Dumper::Deepcopy makes a copy of items that are referenced multiple times (but are not cyclic). So that means that the output is not the same as the input. Whereas in "normal" mode Dumper uses a shorthand that is easy to read (ie its clear whats going on to a reasonable well versed perl programmer) but is not necessarily actually valid perl. A better approach is to use $Data::Dumper::Purity so that when the output is less clear to read, but much more accurate. An example (but using the OO form and not the global var form of controlling the behaviour)

      #!perl -lw use strict; use Data::Dumper; my $hash={"A".."F"}; my $array=[$hash,'foo',$hash]; print $_ for "Normal:", Data::Dumper->new([$array],[qw(array)])->Dump(), "Deepcopy:", Data::Dumper->new([$array],[qw(array)])->Deepcopy(1)->Dum +p(), "Purity:", Data::Dumper->new([$array],[qw(array)])->Purity(1)->Dump( +), "Terse:", Data::Dumper->new([$array],[qw(array)])->Indent(1)->Terse +(1)->Dump();

      outputs

      And the only one that is both valid (ie it compiles) and correct (ie it outputs exactly the same thing as its input) is the one labeled "Purity".

      Deepcopy is useful if for some reason you _don't_ want anythiong referenced more than once (if possible). Note that these settings combine, so to apply Deepcopy to for a cyclic and multiply referenced data structure like the following you would need Purity _and_ Deepcopy.

      #!perl -lw use strict; use Data::Dumper; my $hash ={'A'..'F'}; my $array=[]; @$array=($hash,'foo',$hash,[$hash,'bar',$hash,$array]); print $_ for "Normal:", Data::Dumper->new([$array],[qw(array)])->Dump(), "Deepcopy:", Data::Dumper->new([$array],[qw(array)])->Deepcopy(1)->Dum +p(), "Purity:", Data::Dumper->new([$array],[qw(array)])->Purity(1)->Dump( +), "PureDeep:", Data::Dumper->new([$array],[qw(array)])->Deepcopy(1)->Pur +ity(1)->Dump();

      outputs

      The moral of the story is that if accuracy and validness is required then Purity() is required. See Data::Dumper for more details.


      ---
      demerphq