onslaught has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl monks, I'm new to Perl and struggling with a hash of arrays I've created. Specifically I'm trying to use a foreach loop to loop through the keys of the hash of arrays and then delete elements of an array if they meet a certain criteria (- if two array elements corresponding to two particular keys in the hash of arrays are the same). First I create a normal hash called %phash using data from one file, then I read in another file to create the hash of arrays, called %harry, I'm printing the values I wish to delete first in order to check that I'm accessing the correct element of each array:
foreach $k (keys %harry) { $acount = 0; $bcount = 0; if (exists($phash{$k})) { foreach $a(@{$harry{$k}}) { foreach $b(@{$harry{$phash{$k}}}) { if ($a eq $b) { print "$b\n"; #this works as I expect print "$a\n"; #this works as I expect print "${$harry{$phash{$k}}}[$bcount]\n"; #this returns error Use of uninitialized #value in concatenation (.) or string print "${$harry{$k}}[$acount]\n"; #this works as I expect delete ${$harry{$phash{$k}}}[$bcount]; delete ${$harry{$k}}[$acount]; $deletions = $deletions +2; } $bcount = $bcount +1; } $acount = $acount +1; } } }
I'm confused by the error since when I access the element ${$harry{$k}}$acount this works fine, but if I substitute $k for the value contained in $phash{$k} then the value is apparently uninitialised. I know this can't be the case as $b prints fine so I'm wondering if there is something I'm missing with regards to accessing array elements in a hash of arrays? Thanks in advance for any help and sorry if it's something really simple I'm missing.

Replies are listed 'Best First'.
Re: Accessing (deleting) array elements in a hash of arrays
by jwkrahn (Abbot) on Sep 21, 2008 at 23:41 UTC

    Your problem could be because delete does not remove an array element it just sets its value to undef.   To actually remove an array element you need to use splice but you can't do that from inside a loop that is iterating over that array.

    Also,  ${$harry{$phash{$k}}}[$bcount] is usually written as  $harry{$phash{$k}}[$bcount].

Re: Accessing (deleting) array elements in a hash of arrays
by ysth (Canon) on Sep 22, 2008 at 00:49 UTC
    Some sample data would be helpful in figuring out if you have other problems, but you most certainly should not delete from an array you are looping though with foreach.

    Try looping like this instead:

    for ($acount = $#{$harry{$k}}; $acount >= 0;--$account) { my $a = $harry{$k}[$acount]; for ($bcount = $#{$harry{$phash{$k}}}; $bcount >= 0; --$bcount) { my $b = $harry{$phash{$k}}[$bcount]; if ($a eq $b) ... splice(@{$harry{$k}}, $acount, 1); splice(@{$harry{$phash{$k}}}, $bcount, 1); ...
Re: Accessing (deleting) array elements in a hash of arrays
by jethro (Monsignor) on Sep 21, 2008 at 23:56 UTC

    See manual page perlsyn:

    If any part of LIST is an array, "foreach" will get very confused if you add or remove elements within the loop body, for example with "splice". So don't do that.

    UPDATE: jwkrahn is right, delete doesn't shrink the array except when it is the last element, so above manual page excerpt has nothing to do with the problem

Re: Accessing (deleting) array elements in a hash of arrays (scalability)
by ikegami (Patriarch) on Sep 22, 2008 at 02:58 UTC

    Using splice is very inefficient. It scales very poorly. The following is O(K*A1*A2*(A1+A2)):

    for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; for (my $i1 = @$a1; $i1-- >= 0; ) { for (my $i2 = @$a2; $i2-- >= 0; ) { next if $a1->[$i] ne $a2->[$i2]; splice( @$a1, $i1, 1 ); splice( @$a2, $i2, 1 ); ... } }

    If you keep track of the indexes to delete and delete them later, you can greatly improve the scalability. The following is O(K*A1*A2).

    for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; my %delete_a1; my %delete_a2; OUTER: for $i1 ( 0 .. $#$a1 ) { #next if $delete_a1{$i1}; # Never true. for $i2 ( 0 .. $#$a2 ) { next if $delete_a2{$i2}; next if $a1->[$i1] ne $a2->[$i2]; $delete_a1{$i1} = 1; $delete_a2{$i2} = 1; next OUTER; } } @$a1 = map $a1->[$_], grep !$delete_a1{$_}, 0..$#$a1; @$a2 = map $a2->[$_], grep !$delete_a2{$_}, 0..$#$a2; }

    You can even do better at the cost of readability. The following is O(K*(A1+A2)).

    for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; my %seen; for ( @$a1 ) { push @{ $h1{$a1} }, $_; } my %delete_a1; my %delete_a2; for ( 0..$#$a2 } ) { my $seen = $seen{ $a2->[$_] }; if ( $seen && @$seen ) { $delete_a1{$_} = shift(@$seen); $delete_a2{$_} = 1; } } @$a1 = map $a1->[$_], grep !$delete_a1{$_}, 0..$#$a1; @$a2 = map $a2->[$_], grep !$delete_a2{$_}, 0..$#$a2; }

    All of these are fairly complicated because I didn't assume @$a1 and @$a2 each contained only unique elements. If @$a1 contains no duplicates, and if @$a2 contains no duplicates, you could use a simple set difference.

    for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; my %seen; ++$seen{$_} for @$a1, @$a2; @$a1 = grep $seen{$_}>1, @$a1; @$a2 = grep $seen{$_}>1, @$a2; }
Re: Accessing (deleting) array elements in a hash of arrays
by sflitman (Hermit) on Sep 21, 2008 at 23:59 UTC
    I think the problem is where you use ${...}$index. Try accessing array elements in hash of arrays as: $harry{$phash{$k}}->[$bcount] The other problem is I don't think you can call delete on array elements like that, at least not in Perl 5.8x. The docs say all you get is undef at that position and the later elements don't shift down, you need splice for that. I think I'd probably approach this general problem differently, using grep.
    #!/usr/bin/perl use strict; use Data::Dumper; my %hashOfArrays=( a=>[1,2,3], b=>[4,5,6], c=>[7,8,9] ); my %toDelete=( a=>2, b=>6, c=>8 ); for my $array (keys %hashOfArrays) { if ($toDelete{$array}) { $hashOfArrays{$array}=[ grep { $_ != $toDelete{$array} } @{$hashOfArrays{$array}} ]; } } print Dumper(\%hashOfArrays); # prints $VAR1 = { # 'c' => [ # 7, # 9 # ], # 'a' => [ # 1, # 3 # ], # 'b' => [ # 4, # 5 # ] # }; exit;
    I think this is what you're describing. A hash of arrays contains data which you may wish to selectively delete by specifying which array in the hash, and which element to delete. This code only allows one element at a time to be specified per array, but it wouldn't be hard to support multiple elements to be deleted if needed. Rather than use delete, I create a new anonymous array reference with [ EXPR ] and the array itself is made by grep EXPR ARRAY which converts the original array into a array of elements which do not equal the element to delete. Hope that helps. SSF
Re: Accessing (deleting) array elements in a hash of arrays
by toolic (Bishop) on Sep 22, 2008 at 02:03 UTC
    In addition to the helpful advice already given, a handy debugging tool is Data::Dumper. This can help to get an idea of your array/hash contents at arbitrary points in your code:
    print Dumper(\%harry);

    It is also customary to post (small) segments of your data structures here to help us help you.

Re: Accessing (deleting) array elements in a hash of arrays (looping over indexes)
by ikegami (Patriarch) on Sep 22, 2008 at 02:33 UTC
    my $i = 0; for my $ele (@array) { ... $i++; }
    is better written as
    for my $i (0..$#array) { my $ele = $array[$i]; ... }

    It's much more readable, and it's not broken by the use of next. (And you don't have to worry about putting the $i=0 in the wrong place like you did.)

    The downside is that $ele is no longer an alias. Just use $array[$i] directly if you need to modify the original array. Just can use $array[$i] directly, period.

Re: Accessing (deleting) array elements in a hash of arrays
by gone2015 (Deacon) on Sep 22, 2008 at 10:07 UTC

    So, you are stepping through two arrays using foreach $a and foreach $b, and in the inner loop you need the indexes for $a ($acount) and $b ($acount).

    You are incrementing $acount and $bcount in the right place -- but you need to set $bcount to zero just before the foreach $b. I suggest that this accounts for your immediate problem.

    Other's have pointed out that delete on arrays doesn't actually remove an array element, and if it did who knows whether foreach would cope.

    From a style perspective I suggest that calling these indexes $acount obscures their purpose -- which is probably why the problem was hard to see. I would have called them $ia and $ib. Probably best to stick the initialisation of the index hard up against the related foreach.

    Also, from a style perspective I wonder whether:

    my $ia = 0 ; foreach my $a (@afoo) { ... acres of stuff ... $ia++ ; } ;
    is as clear as:
    for my $ia (0..$#afoo) { my $a = $afoo[$ia] ; } ;
    because in the second case the loop control is all at the top of the loop. Mind you, the two are not equivalent, because of the quantum entanglement between $a and the array element in the first ! Perhaps better is:
    my $ia = -1 ; foreach my $a (@afoo) { $ia++ ; ...... } ;
    although I cannot help feeling that setting $ia to -1 is ugly :-(