in reply to Re: comparing 2 files problem
in thread comparing 2 files problem

After reading your commment (and adapting slightly from hardburn's comment), I came up with the following code using hashs (as mentioned above , and with the same cautions), which handles both the case of an entry in file 2 but not file 1, as well as multiple occurrences of an entry in a file (by listing the locations in the results). It does not, however, cover the difference in the number of occurrences of an entry in the two files. (Data files adapted from those in the comment by ikegami.)

#!/usr/bin/perl -w use strict; if ( scalar(@ARGV) < 2 ) { print "Usage:\n\t$0 file1 file2\n\n"; die; } my @filename = ( $ARGV[0], $ARGV[1] ); my (@content); foreach my $i ( 0, 1 ) { open( DF, $filename[$i] ) or die("Can't open $filename[$i] for input: $!\n"); while (<DF>) { chomp; push( @{ $content[$i]{$_} }, $. ); } close(DF); } my @keycount = ( scalar( keys( %{ $content[0] } ) ), scalar( keys( %{ $content[1] } ) ) ); if ( $keycount[0] != $keycount[1] ) { my @differential = @filename; if ( $keycount[0] > $keycount[1] ) { @differential = reverse(@filename); } print "Fewer values detected in ", $differential[0], " than ", $differential[1], "\n"; } foreach my $k ( sort( keys( %{ $content[0] } ) ) ) { if ( defined( $content[1]{$k} ) ) { print $k, "\n"; foreach ( 0, 1 ) { print "\tFound in ", $filename[$_], " at line(s): ", join( ', ', @{ $content[$_]{$k} } ), "\n"; delete( $content[$_]{$k} ); } } } @keycount = ( scalar( keys( %{ $content[0] } ) ), scalar( keys( %{ $content[1] } ) ) ); if ( $keycount[0] or $keycount[1] ) { foreach ( 0, 1 ) { if ( $keycount[$_] ) { print "Found in ", $filename[$_], " but not in ", $filename[ ( $_ + 1 ) % 2 ], ":\n"; foreach my $k ( sort( keys( %{ $content[$_] } ) ) ) { print "\t'", $k, "' at line(s): ", join( ', ', @{ $content[$_]{$k} } ), "\n"; delete( $content[$_]{$k} ); } } } }

Sample input files:

$ cat file1.txt qwerty snakegod ebrine tarot $ cat file2.txt snakegod ordo rosae moriatur tarot wrath of hibernia $ cat file3.txt qwerty snakegod ebrine tarot qwerty $ cat file4.txt qwerty

Sample execution runs:

$ perl cdiff.pl file1.txt file1.txt ebrine Found in file1.txt at line(s): 3 Found in file1.txt at line(s): 3 qwerty Found in file1.txt at line(s): 1 Found in file1.txt at line(s): 1 snakegod Found in file1.txt at line(s): 2 Found in file1.txt at line(s): 2 tarot Found in file1.txt at line(s): 4 Found in file1.txt at line(s): 4 $ perl cdiff.pl file1.txt file3.txt ebrine Found in file1.txt at line(s): 3 Found in file3.txt at line(s): 3 qwerty Found in file1.txt at line(s): 1 Found in file3.txt at line(s): 1, 5 snakegod Found in file1.txt at line(s): 2 Found in file3.txt at line(s): 2 tarot Found in file1.txt at line(s): 4 Found in file3.txt at line(s): 4 $ perl cdiff.pl file1.txt file2.txt snakegod Found in file1.txt at line(s): 2 Found in file2.txt at line(s): 1 tarot Found in file1.txt at line(s): 4 Found in file2.txt at line(s): 3 Found in file1.txt but not in file2.txt: 'ebrine' at line(s): 3 'qwerty' at line(s): 1 Found in file2.txt but not in file1.txt: 'ordo rosae moriatur' at line(s): 2 'wrath of hibernia' at line(s): 4 $ perl cdiff.pl file1.txt file4.txt Fewer values detected in file4.txt than file1.txt qwerty Found in file1.txt at line(s): 1 Found in file4.txt at line(s): 1 Found in file1.txt but not in file4.txt: 'ebrine' at line(s): 3 'snakegod' at line(s): 2 'tarot' at line(s): 4

Hope that helps.