in reply to File comparison: not diff

Here's a little program I whipped up that should do about what you want. Hopefully it should be easy to modify so that you can plug it into whatever your project is. I tested it on the input you gave, but not extensively.
#/usr/bin/perl use strict; use warnings; @ARGV == 2 or die "Must specify 2 files!\n"; my $afile = shift; my $bfile = shift; my $ahash = make_hash($afile); my $bhash = make_hash($bfile); sub make_hash { my $file = shift; my %hash = (); open IN, "<$file" or die "Can't open '$file': $!\n"; while (<IN>) { chomp; my ($key,$val) = split(/,/,$_); $hash{$key} = $val; } return \%hash; } print "In A hash but not B hash:\n", map {"$_\n"} grep {not exists $bhash->{$_}} keys %$ahash; print "In B hash but not A hash:\n", map {"$_\n"} grep {not exists $ahash->{$_}} keys %$bhash; print "In A hash and B hash but different:\n", map {"$_\n"} grep {exists $bhash->{$_} and $ahash->{$_} ne $bhash->{$_ +}} keys %$ahash;

Replies are listed 'Best First'.
Re: Answer: File comparison: not diff
by rschuler (Beadle) on Apr 09, 2002 at 20:23 UTC
    Thanks thelenm I like your answer. I did not think to use hashes.

    But I see a problem, the files are large, about 500000 lines, each line is up to 1275 characters (avg about 250). That's about 500000 x 250 x 2 = 238 Mbytes of data in the hashes. The data will grow in the future. I think that will be rather stressful for the machine it's running on:) I think I'll try sorting the files beforehand, and keep only two lines in memory. Something like:

    #code fragment leaving out parseing, reporting, use strict etc. my ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); my ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); # work through both files sequentially matching advances by key while(defined($leftkey) and defined($rightkey)) { my $compare = $leftkey cmp $rightkey; if($compare == 0) { if ($leftvalue ne $rightvalue) { value_diff($leftkey, $leftvalue, $rightkey, $rightvalue); } ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); next; } elsif($compare > 0) { missing_left($rightkey,$rightvalue); ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); next; } else { missing_right($leftkey,$leftvalue); ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); next; } } # are there missing items at end of the files? while(defined($leftkey)) { missing_right($leftkey,$leftvalue); ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); } while(defined($rightkey)) { missing_left($rightkey,$rightvalue); ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); }