comparign two files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there , Can someone tell me which approch to follow when doing this ? I have two file that contain information as follow :


 --DATA1--            --DATA2-- 
Monk Apple   One         Sam  Orange  Two    
Sam  Orange  Two         Sue  Apple   One    
Sue  Apple   One         Monk Apple   One
                         Mike Bannana One
                         Don  Apple   Two
[download]

What I need to do is compare Data1 file with Data2 file and print the different on Data3 file, I only print the name of the person, so My Data3 will look like this

--DATA3--
Don
Mike
[download]

I was following an approach of using arrays


my @Found = ();

while (my $line = <DATA1>)
{
chomp $line1;

while (my $line2 = <DATA2>)
{
chomp $line2;

if ("$line1" eq "$line2") {}
else { push @Found, $line2 }
}
}

open (DATA3, "data3") or die;
{
print @Found;
}
[download]

Not finding the difference between the two files. Can someone help ? thanks a lot

Comment on comparign two files Select or Download Code

Replies are listed 'Best First'.
Re: comparing two files by Roy Johnson (Monsignor) on Mar 09, 2004 at 15:49 UTC
If the order of lines isn't important, then you should use hashes: Update: refactored to use one hash and to print only the first column. `my %seen; #open DATA1 here ++$seen{$_} while (<DATA1>); #close DATA1 here #open DATA2 here while (<DATA2>) { # Print any lines that are found, that weren't in DATA1 print((split)[0], "\n") unless (defined(delete $seen{$_})); } # print what's left print((split)[0], "\n") for (keys %seen);` [download] If this quick and dirty approach isn't what you're looking for, check out the Algorithm::Diff module. The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: comparign two files by Limbic~Region (Chancellor) on Mar 09, 2004 at 15:55 UTC
Anonymous Monk, You have not mentioned a few things I consider very important. Do you need to know to which file the extra line(s) came from? Can one file contain the same line more than once and is that relavent? Is order important? I will give you both standard responses when this question is asked. Use a hash Use diff `#!/usr/bin/perl use strict; use warnings; open (FILE1, '<', 'file1.txt') or die "Unable to open file1.txt for re +ading : $!"; open (FILE2, '<', 'file2.txt') or die "Unable to open file2.txt for re +ading : $!"; my %lines; while ( <FILE1> ) { chomp; $lines{$_}++ } while ( <FILE2> ) { chomp; $lines{$_}++ } open (FILE3, '>', 'file3.txt') or die "Unable to open file3.txt for wr +iting : $!"; for ( keys %lines ) { next if $lines{$_} > 1; print FILE3 "$_\n"; }` [download] If you are not on a nix system with diff or if you have not installed a nix toolkit for Win32 you can find a pure perl implementation of diff here and sort here. You may also want to look into Perltidy. This question gets asked a lot so you may also want to look at our Q and A section in the future as well. Cheers - L~R	[reply] [d/l]
Re: comparign two files by TomDLux (Vicar) on Mar 09, 2004 at 16:03 UTC
If you are on a Unix platform, comm is the utility you want. It differentiates lines of the two input files into 'unique to file A', 'unique to file B', and 'common to both'. command line flags controls which of those categories are output. -- `TTTATCGGTCGTTATATAGATGTTTGCA`	[reply]
Re: Re: comparign two files by Happy-the-monk (Canon) on Mar 09, 2004 at 19:48 UTC
as the comm(1) manual says you'd have to sort(1) the files first. There are some versions of uniq(1) that will also do it, should comm be missing. Sören	[reply]
Re: comparign two files by graff (Chancellor) on Mar 10, 2004 at 05:43 UTC
Your examples didn't make this clear... what would you want as output if your two input files were: `-- FILE 1 -- -- FILE 2 -- Monk apple one Monk apple one Punk apple two Punk peach two John peach three John peach two Jane plum three Jack plum three` [download] The question is: Which of the following best describes your task? The comparison consists of using just the first column as the "key" field, and you just want to print the keys that are unique to one file or the other. The comparison involves whole lines -- the first column of a line is printed if the other file does not contain an exact match for the whole line. If your task is like the first one, you could check out a command line utility script that I posted here. If it's the latter, a different approach would be needed.	[reply] [d/l]