perlquestion
melmoth
<p>
I have an array of array references or two dimensional array called paths. It can contain anywhere from 2 - 163888 array references, and each reference points to an array with anywhere from 2 - 12 elements.
The code below works but when paths contains a lot of references it is too slow. My purpose is to remove arrays that have all their elements in another array. For example given
<code>
@array1 = [A,B,F,G]
@array2 = [A, B, C, D, E, F, G]
</code>
<p>
then array1 should be removed because all of its elements are in array2.
My strategy was to sort paths by the size of the arrays ( smallest to largest ) and then for each array loop over all the other arrays to check if the smaller array is contained in a larger array. As soon as we find that it is contained in another array we remove it, and then check the next smallest array, and so on. But 32488 arrays is a lot to go over like this. I need a faster way. If someone knows how to do this I'd really appreciate it. thanks. - Robert
</p>
<code>
my @filtered;
@paths = sort { @$a <=> @$b } @paths;
LINE:
for ( my $i = 0; $i < scalar @paths; $i++ )
{
my $path = $paths[$i];
my %nodes;
@nodes{@$path} = ();
for ( my $j = $i + 1; $j < scalar @paths; $j++ )
{
my $path_b = $paths[$j];
my $c = grep { exists $nodes{$_} } @$path_b;
next LINE if $c == scalar @$path;
}
push @filtered, $path;
}
</code>