Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Difference of array

by sandy1028 (Sexton)
on May 15, 2009 at 08:29 UTC ( [id://764213] : perlquestion . print w/replies, xml ) Need Help??

sandy1028 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, If the take the difference of two files, one having 13423 and other 12354. The difference is around 1100(approx). But if the contents of the file if I copy to an array and
use the below code, the difference is only 200(approx). How can I get the exact count as
diff file1 file2.
@union = @intersection = @difference = (); %count = (); foreach $element (@array1, @array2) { $count{$element}++ } foreach $element (keys %count) { push @union, $element; push @{ $count{$element} > 1 ? \@intersection : \@diff +erence }, $element; }

Replies are listed 'Best First'.
Re: Difference of array
by CountZero (Bishop) on May 15, 2009 at 09:09 UTC
    If you do not want to re-invent the wheel, look at Array::Diff and Array::Compare.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Difference of array
by moritz (Cardinal) on May 15, 2009 at 08:36 UTC
    This is the code from perlfaq4, and it also says
    It assumes that each element is unique in a given array

    Is that the case for your arrays? If not, it would explain the difference.

    Update: See almut's reply below, this code is nonesense.

    (Update:) I'd just store one array in the hash, something along these lines: (untested)

    my (@intersection, @union); my %count; @count{@array1} = undef; for (@array2) { if (exists $count{$_}) { push @union, $_; } else { push @intersection, $_; } }

    (Second update): Any idea why the answer in the FAQs makes such an IMHO needless assumption? The code without that assumption isn't much longer, and although I haven't benchmarked it I don't think it's much slower either (I even think it uses less memory).

      Any idea why the answer in the FAQs makes such an IMHO needless assumption?

      I think your code doesn't compute the difference, and the union also isn't what you'd normally define as union (even if you swap @intersection and @union)...

      my @array1 = qw(foo foo bar baz); my @array2 = qw(bar grmpf asdf); my (@intersection, @union); my %count; @count{@array1} = undef; for (@array2) { if (exists $count{$_}) { push @union, $_; } else { push @intersection, $_; } } use Data::Dumper; print Dumper \@intersection, \@union; __END__ $VAR1 = [ 'grmpf', 'asdf' ]; $VAR2 = [ 'bar' ];
        Yes, you're totally right. If one allows duplicates, the union is just @union = @array1, @array2.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Difference of array
by wol (Hermit) on May 15, 2009 at 10:11 UTC
    If you're manipulating sets then take a look at Set::Scalar.

    use JAPH;
    print JAPH::asString();