MrMadScience has asked for the wisdom of the Perl Monks concerning the following question:

In unix, two files can be compared in a quick and dirty manner by using a
foreach line ( ` cat file1` ) grep $line file2
to look for the lines shared by the two. I have written a perl script to read each file into an array, and I would like to use the arrays to find what is common to the two files and then work with both what is and is not common. Other than using the grep module, what would be the best way to do this?
I found that there is an exists command for arrays, but it seems to require an index number, while I'm looking to search by array element. I've considered reading each file into a hash and then searching the hashes with exists commands, but I though there ought to be an easier way... Thanks all!

Replies are listed 'Best First'.
Re: Comparing two file in Arrays
by skx (Parson) on Jan 02, 2004 at 18:24 UTC

    The Perl Cookbook has a section on working with pairs of arrays to find intersections, differences, etc.

    The following is one piece of code it contains - using a temporary hash:

    @a = (1, 3, 5, 6, 7, 8); @b = (2, 3, 5, 7, 9); @union = @isect = (); %union = %isect = (); %count = (); foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ } @union = keys %union; @isect = keys %isect;
    Steve
    ---
    steve.org.uk
      That is a nice, slick/quick method -- but it will produce false positives for intersections if either array happens to contain multiple elements with the same value (e.g. in your example, if @a contained two elements with a value of 6).
Re: Comparing two file in Arrays
by duff (Parson) on Jan 02, 2004 at 17:35 UTC

    In unix two files can be compared in a quick and dirty manner with the comm program. In perl, you probably want to read perldoc -q intersection

      true. comm is also a good way to do that. Looping through the text file gives the added advantage of being able to discriminately direct the text based on matches, which would otherwise need multiple incantations of comm. Thanks for pointing me to the readme, I think those methods ought to do it for me.

        Algorithm::HowSimilar is probably the fastest solution but only because it uses the excellent Algorithm::Diff module.

        use Algorithm::HowSimilar 'compare'; my ( $av_similarity, $sim_ary1_to_ary2, $sim_ary2_to_ary1, $ref_ary_matches, $ref_ary_in_ary1_but_not_ary2, $ref_ary_in_ary2_but_not_ary1 ) = compare( \@ary1, \@ary2 );

        cheers

        tachyon

Re: Comparing two file in Arrays
by jonnyfolk (Vicar) on Jan 02, 2004 at 18:23 UTC

    This is something I wrote a while back to compare two files line by line. It reports the line number and prints that line in both files, or prints no difference found. It might be useful to you...

    #!/usr/bin/perl -w use strict; use CGI::Carp qw(fatalsToBrowser warningsToBrowser); use CGI ':standard'; my $txt1 = "/path/public_html/test/txt1.txt"; my $txt2 = "/path/public_html/test/txt2.txt"; my $count; my $item; open FH, "$txt1" or die $!; my @all = <FH>; close FH; print "Content-type: text/html\n\n"; open PAGE, "$txt2" or die $!; foreach my $line (<PAGE>) { $count++; $item = shift (@all); if ($line ne $item) { print "line no: $count<br> $line<br> $item<br>"; } else { print "No differences found<br>"; } }
Re: Comparing two file in Arrays
by DrHyde (Prior) on Jan 02, 2004 at 19:32 UTC