comment on

After reading your commment (and adapting slightly from hardburn's comment), I came up with the following code using hashs (as mentioned above , and with the same cautions), which handles both the case of an entry in file 2 but not file 1, as well as multiple occurrences of an entry in a file (by listing the locations in the results). It does not, however, cover the difference in the number of occurrences of an entry in the two files. (Data files adapted from those in the comment by ikegami.)

#!/usr/bin/perl -w
use strict;

if ( scalar(@ARGV) < 2 ) {
    print "Usage:\n\t$0 file1 file2\n\n";
    die;
}

my @filename = ( $ARGV[0], $ARGV[1] );
my (@content);

foreach my $i ( 0, 1 ) {
    open( DF, $filename[$i] )
      or die("Can't open $filename[$i] for input: $!\n");
    while (<DF>) {
        chomp;
        push( @{ $content[$i]{$_} }, $. );
    }
    close(DF);
}

my @keycount = (
    scalar( keys( %{ $content[0] } ) ),
    scalar( keys( %{ $content[1] } ) )
);
if ( $keycount[0] != $keycount[1] ) {
    my @differential = @filename;
    if ( $keycount[0] > $keycount[1] ) {
        @differential = reverse(@filename);
    }
    print "Fewer values detected in ", $differential[0],
      " than ", $differential[1], "\n";
}
foreach my $k ( sort( keys( %{ $content[0] } ) ) ) {
    if ( defined( $content[1]{$k} ) ) {
        print $k, "\n";
        foreach ( 0, 1 ) {
            print "\tFound in ", $filename[$_], " at line(s): ",
              join( ', ', @{ $content[$_]{$k} } ), "\n";
            delete( $content[$_]{$k} );
        }
    }
}
@keycount = (
    scalar( keys( %{ $content[0] } ) ),
    scalar( keys( %{ $content[1] } ) )
);
if ( $keycount[0] or $keycount[1] ) {
    foreach ( 0, 1 ) {
        if ( $keycount[$_] ) {
            print "Found in ", $filename[$_], " but not in ",
              $filename[ ( $_ + 1 ) % 2 ], ":\n";
            foreach my $k ( sort( keys( %{ $content[$_] } ) ) ) {
                print "\t'", $k, "' at line(s): ",
                  join( ', ', @{ $content[$_]{$k} } ), "\n";
                delete( $content[$_]{$k} );
            }
        }
    }
}
[download]

Sample input files:

$ cat file1.txt
qwerty
snakegod
ebrine
tarot
$ cat file2.txt
snakegod
ordo rosae moriatur
tarot
wrath of hibernia
$ cat file3.txt
qwerty
snakegod
ebrine
tarot
qwerty
$ cat file4.txt
qwerty
[download]

Sample execution runs:

$ perl cdiff.pl file1.txt file1.txt
ebrine
        Found in file1.txt at line(s): 3
        Found in file1.txt at line(s): 3
qwerty
        Found in file1.txt at line(s): 1
        Found in file1.txt at line(s): 1
snakegod
        Found in file1.txt at line(s): 2
        Found in file1.txt at line(s): 2
tarot
        Found in file1.txt at line(s): 4
        Found in file1.txt at line(s): 4
$ perl cdiff.pl file1.txt file3.txt
ebrine
        Found in file1.txt at line(s): 3
        Found in file3.txt at line(s): 3
qwerty
        Found in file1.txt at line(s): 1
        Found in file3.txt at line(s): 1, 5
snakegod
        Found in file1.txt at line(s): 2
        Found in file3.txt at line(s): 2
tarot
        Found in file1.txt at line(s): 4
        Found in file3.txt at line(s): 4
$ perl cdiff.pl file1.txt file2.txt
snakegod
        Found in file1.txt at line(s): 2
        Found in file2.txt at line(s): 1
tarot
        Found in file1.txt at line(s): 4
        Found in file2.txt at line(s): 3
Found in file1.txt but not in file2.txt:
        'ebrine' at line(s): 3
        'qwerty' at line(s): 1
Found in file2.txt but not in file1.txt:
        'ordo rosae moriatur' at line(s): 2
        'wrath of hibernia' at line(s): 4
$ perl cdiff.pl file1.txt file4.txt
Fewer values detected in file4.txt than file1.txt
qwerty
        Found in file1.txt at line(s): 1
        Found in file4.txt at line(s): 1
Found in file1.txt but not in file4.txt:
        'ebrine' at line(s): 3
        'snakegod' at line(s): 2
        'tarot' at line(s): 4
[download]

Hope that helps.

In reply to Re^2: comparing 2 files problem by atcroft
in thread comparing 2 files problem by mosh

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.