compare two files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: compare two files by monarch (Priest) on Jul 24, 2007 at 13:18 UTC
This is my guess of how to implement my interpretation of your query: use strict; # get parameters from command line my $fname1 = shift; my $fname2 = shift; my $fnameout = shift; # read in all numbers from file 2 into a hash open( FIN, "<$fname2" ) or die( "Cannot open $fname2: $!" ); my %exclude_num = (); while ( defined( my $line = <FIN> ) ) { # remove trailing newlines $line =~ s/[\r\n]+\z//s; # store number in hash $exclude_num{$line} = 1; } close( FIN ); # read in numbers from file 1, skipping excluded numbers open( FIN, "<$fname1" ) or die( "Cannot open $fname1: $!" ); open( FOUT, ">$fnameout" ) or die( "Cannot create $fnameout: $!" ); while ( defined( my $line = <FIN> ) ) { # remove trailing newlines $line =~ s/[\r\n]+\z//s; # skip excluded numbers next if ( $exclude_num{$line} ); print( FOUT "$line\n" ); } close( FOUT ); close( FIN ); [download]	[reply] [d/l]
Re: compare two files by wojtyk (Friar) on Jul 24, 2007 at 13:48 UTC
If you're trying to alter file1 to be the set complement of the two files, you could use this to do it on the fly and not create a temp file: `use Tie::File; my %seen; tie my @file1, 'Tie::File', 'file1' or die; tie my @file2, 'Tie::File', 'file2' or die; foreach (@file2) { chomp; $seen{$_}++; } @file1 = grep { chomp; !$seen{$_} } @file1; untie(@file1); untie(@file2);` [download]	[reply] [d/l]
Re^2: compare two files by citromatik (Curate) on Jul 24, 2007 at 15:27 UTC
Once you have both files as lists, you can use List::MoreUtils to test uniq-ness: `use Tie::File; use List::MoreUtils qw(uniq); tie my @file1, 'Tie::File', 'file1' or die; tie my @file2, 'Tie::File', 'file2' or die; print join "\n",uniq (@file1,@file2);` [download] citromatik	[reply] [d/l]
Re: compare two files by dsheroh (Monsignor) on Jul 24, 2007 at 14:46 UTC
If the files are sorted, your quickest option will probably be to take the first line from each file, print the line from file1 if they're different or advance a line in file2 if they're the same, and then advance a line in file1. This is also very space-efficient, since you only need to have one line from each file in memory at a time. A basic implementation of this would be: Read more... (734 Bytes) Note that this implementation assumes that there are no values in file2 which are not also present in file1 and that neither file contains any duplicates. If the files are not sorted (and you're not going to be using them repeatedly), then a hash-based solution such as others have proposed would probably be faster than sorting them and using this method.	[reply] [d/l]
Re: compare two files by citromatik (Curate) on Jul 24, 2007 at 15:33 UTC
If you are only trying to do the job, a simple line of shell code is sufficient: (or 2 lines if the files are not sorted) `# if the files are not sorted, sort them $ sort -k 1,1n file1 > file1.sorted $ sort -k 1,1n file2 \| join -v 1 file1.sorted - > file1.uniq` [download] citromatik	[reply] [d/l]
Re: compare two files by leocharre (Priest) on Jul 24, 2007 at 13:26 UTC
I am guessing your files would be: File a: 123123123 File b: 123123 Where do you want to take the numbers from? The start, or the end? What generates these files? Are only \d digits present in the file? Should your program freak out if none digit characters are present?	[reply]