Using splice is very inefficient. It scales very poorly. The following is O(K*A1*A2*(A1+A2)):
for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; for (my $i1 = @$a1; $i1-- >= 0; ) { for (my $i2 = @$a2; $i2-- >= 0; ) { next if $a1->[$i] ne $a2->[$i2]; splice( @$a1, $i1, 1 ); splice( @$a2, $i2, 1 ); ... } }
If you keep track of the indexes to delete and delete them later, you can greatly improve the scalability. The following is O(K*A1*A2).
for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; my %delete_a1; my %delete_a2; OUTER: for $i1 ( 0 .. $#$a1 ) { #next if $delete_a1{$i1}; # Never true. for $i2 ( 0 .. $#$a2 ) { next if $delete_a2{$i2}; next if $a1->[$i1] ne $a2->[$i2]; $delete_a1{$i1} = 1; $delete_a2{$i2} = 1; next OUTER; } } @$a1 = map $a1->[$_], grep !$delete_a1{$_}, 0..$#$a1; @$a2 = map $a2->[$_], grep !$delete_a2{$_}, 0..$#$a2; }
You can even do better at the cost of readability. The following is O(K*(A1+A2)).
for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; my %seen; for ( @$a1 ) { push @{ $h1{$a1} }, $_; } my %delete_a1; my %delete_a2; for ( 0..$#$a2 } ) { my $seen = $seen{ $a2->[$_] }; if ( $seen && @$seen ) { $delete_a1{$_} = shift(@$seen); $delete_a2{$_} = 1; } } @$a1 = map $a1->[$_], grep !$delete_a1{$_}, 0..$#$a1; @$a2 = map $a2->[$_], grep !$delete_a2{$_}, 0..$#$a2; }
All of these are fairly complicated because I didn't assume @$a1 and @$a2 each contained only unique elements. If @$a1 contains no duplicates, and if @$a2 contains no duplicates, you could use a simple set difference.
for my $k ( keys %harry ) { my $a1 = $harry{$k}; my $a2 = $harry{$phash{$k}}; my %seen; ++$seen{$_} for @$a1, @$a2; @$a1 = grep $seen{$_}>1, @$a1; @$a2 = grep $seen{$_}>1, @$a2; }
In reply to Re: Accessing (deleting) array elements in a hash of arrays (scalability)
by ikegami
in thread Accessing (deleting) array elements in a hash of arrays
by onslaught
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |