After reading your commment (and adapting slightly from hardburn's comment), I came up with the following code using hashs (as mentioned above , and with the same cautions), which handles both the case of an entry in file 2 but not file 1, as well as multiple occurrences of an entry in a file (by listing the locations in the results). It does not, however, cover the difference in the number of occurrences of an entry in the two files. (Data files adapted from those in the comment by ikegami.)

#!/usr/bin/perl -w use strict; if ( scalar(@ARGV) < 2 ) { print "Usage:\n\t$0 file1 file2\n\n"; die; } my @filename = ( $ARGV[0], $ARGV[1] ); my (@content); foreach my $i ( 0, 1 ) { open( DF, $filename[$i] ) or die("Can't open $filename[$i] for input: $!\n"); while (<DF>) { chomp; push( @{ $content[$i]{$_} }, $. ); } close(DF); } my @keycount = ( scalar( keys( %{ $content[0] } ) ), scalar( keys( %{ $content[1] } ) ) ); if ( $keycount[0] != $keycount[1] ) { my @differential = @filename; if ( $keycount[0] > $keycount[1] ) { @differential = reverse(@filename); } print "Fewer values detected in ", $differential[0], " than ", $differential[1], "\n"; } foreach my $k ( sort( keys( %{ $content[0] } ) ) ) { if ( defined( $content[1]{$k} ) ) { print $k, "\n"; foreach ( 0, 1 ) { print "\tFound in ", $filename[$_], " at line(s): ", join( ', ', @{ $content[$_]{$k} } ), "\n"; delete( $content[$_]{$k} ); } } } @keycount = ( scalar( keys( %{ $content[0] } ) ), scalar( keys( %{ $content[1] } ) ) ); if ( $keycount[0] or $keycount[1] ) { foreach ( 0, 1 ) { if ( $keycount[$_] ) { print "Found in ", $filename[$_], " but not in ", $filename[ ( $_ + 1 ) % 2 ], ":\n"; foreach my $k ( sort( keys( %{ $content[$_] } ) ) ) { print "\t'", $k, "' at line(s): ", join( ', ', @{ $content[$_]{$k} } ), "\n"; delete( $content[$_]{$k} ); } } } }

Sample input files:

$ cat file1.txt qwerty snakegod ebrine tarot $ cat file2.txt snakegod ordo rosae moriatur tarot wrath of hibernia $ cat file3.txt qwerty snakegod ebrine tarot qwerty $ cat file4.txt qwerty

Sample execution runs:

$ perl cdiff.pl file1.txt file1.txt ebrine Found in file1.txt at line(s): 3 Found in file1.txt at line(s): 3 qwerty Found in file1.txt at line(s): 1 Found in file1.txt at line(s): 1 snakegod Found in file1.txt at line(s): 2 Found in file1.txt at line(s): 2 tarot Found in file1.txt at line(s): 4 Found in file1.txt at line(s): 4 $ perl cdiff.pl file1.txt file3.txt ebrine Found in file1.txt at line(s): 3 Found in file3.txt at line(s): 3 qwerty Found in file1.txt at line(s): 1 Found in file3.txt at line(s): 1, 5 snakegod Found in file1.txt at line(s): 2 Found in file3.txt at line(s): 2 tarot Found in file1.txt at line(s): 4 Found in file3.txt at line(s): 4 $ perl cdiff.pl file1.txt file2.txt snakegod Found in file1.txt at line(s): 2 Found in file2.txt at line(s): 1 tarot Found in file1.txt at line(s): 4 Found in file2.txt at line(s): 3 Found in file1.txt but not in file2.txt: 'ebrine' at line(s): 3 'qwerty' at line(s): 1 Found in file2.txt but not in file1.txt: 'ordo rosae moriatur' at line(s): 2 'wrath of hibernia' at line(s): 4 $ perl cdiff.pl file1.txt file4.txt Fewer values detected in file4.txt than file1.txt qwerty Found in file1.txt at line(s): 1 Found in file4.txt at line(s): 1 Found in file1.txt but not in file4.txt: 'ebrine' at line(s): 3 'snakegod' at line(s): 2 'tarot' at line(s): 4

Hope that helps.


In reply to Re^2: comparing 2 files problem by atcroft
in thread comparing 2 files problem by mosh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.