Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Comparison Of Files

by Anonymous Monk
on Dec 05, 2000 at 23:46 UTC ( [id://45065]=note: print w/replies, xml ) Need Help??


in reply to Comparison Of Files

With 700 lines you can easily read one file into a hash and then compare each row in the other file against that hash.

With larger files (1 MB and up), you may wish to save a lot of memory by noticing, that the files are sorted, alphabetically, it seems. Also, in this case most of the lines will be present in both files, so storing the differing rows will not consume insanious amounts of memory :-)

Here is an mergesortish way to do it:
=head1 compare_sorted_files_by_line($filename1, $filename2)

Finds lines that are present in only one of the files, whose names are
given as arguments. This function assumes that the lines in the files are
in alphabetical order.

Returns the unique rows in each file, in two list references. The first one
points to an array containing the rows that are present in $filename1 only,
and the second one similarly for $filename2.

Returns an empty list if either of the files could not be opened for reading.

=cut

sub compare_sorted_files_by_line( $$ )
{
    my($filename1, $filename2) = @_;

    my(@in1only, @in2only); # The unique rows ("matches") are stored in these

    unless(open(FILE1, "< $filename1"))
    { warn "$0: Could not open $filename1: $!\n"; return (); }
    unless(open(FILE2, "< $filename2"))
    { warn "$0: Could not open $filename2: $!\n"; close FILE1; return ();}

    my $line1 = <FILE1>;
    my $line2 = <FILE2>;

    while(defined($line1) and defined($line2))
    {
        my $compare = $line1 cmp $line2;
        if($compare == 0)
        {
            $line1 = <FILE1>;
            $line2 = <FILE2>;
            next;
        }
        elsif($compare > 0)
        {
            push(@in2only, $line2);
            $line2 = <FILE2>;
            next;
        }
        else
        {
            push(@in1only, $line1);
            $line1 = <FILE1>;
        }
    }
    # were there differences at end of file?
    if(defined($line1))
    {
        push(@in1only, $line1);
        push(@in1only, $_) while(<FILE1>);
    }
    if(defined($line2))
    {
        push(@in2only, $line2);
        push(@in2only, $_) while(<FILE2>);
    }
    close FILE1;
    close FILE2;

    # we happen to like strings without newlines.
    chomp(@in1only);
    chomp(@in2only);

    return(\@in1only, \@in2only);
}
-Bass

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://45065]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-19 06:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found