Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Assistance with file compare

by Karger78 (Beadle)
on Oct 28, 2009 at 19:14 UTC ( #803767=perlquestion: print w/replies, xml ) Need Help??

Karger78 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have an odd question that i need to help figuring out regarding file compare. So within my code, @return builds a list of files from a specific network share. @remoteFilelist builds another list of files. Both arrays are different sizes, however, I would like to compare the files from both arrays and if they match print both files are equal. I am just confused how i should go about this file comparison to ensure that all files are checked and confirmed if they are equal or not.
@remoteFilelist = File::Find::Rule->file()->in($remoteCpPath); # foreach (@return) { if(compare_text($path.@return,$path2.@remoteFilelist) == 0 +) { print "Both Files are equal\n"; }else{ print "file not equal copy time"; } }

Replies are listed 'Best First'.
Re: Assistance with file compare
by almut (Canon) on Oct 28, 2009 at 19:46 UTC

    In case you want to compare files with the same name (that exist in both lists/directories), you could compute the intersection of both lists, and then iterate over the resulting list of files names, simply prepending the appropriate paths... Something like:

    my %seen; $seen{$_}++ for @return, @remoteFilelist; my @files_in_both_lists = grep $seen{$_} > 1, keys %seen; for my $fname (@files_in_both_lists) { if (compare_text("$path1/$fname", "$path2/$fname") == 0) { #... } }

    Otherwise (if you want to compare every file in list 1 with every file in list 2), I would compute checksums (e.g. MD5) for all files, and use the checksums as keys in a hash, with a list of filenames as the associated value. Those entries with more than one file in that list will indicate identical files...

    Update: sample code for the latter approach:

    #!/usr/bin/perl use strict; use warnings; use Digest::MD5; my @allfiles = ...; # your file lists merged (including paths) my %by_md5; for my $file (@allfiles) { open my $fh, "<", $file or die "Couldn't open '$file': $!"; binmode $fh; my $md5 = Digest::MD5->new(); $md5->addfile($fh); my $digest = $md5->hexdigest(); # or ->digest() -- hexdigest is j +ust more "dumping-friendly"... push @{ $by_md5{$digest} }, $file; } for my $digest (grep @{$by_md5{$_}} > 1, keys %by_md5) { print "duplicates: @{ $by_md5{$digest} }\n"; }

    (In case you're paranoid (and worry about the very unlikely case of a digest collision), you can always do a byte-for-byte comparison of the files with the same digest...(those reported as duplicates with the above snippet))

      Well, here is the issue, the files do not have the same name. other then the name the files should be the exact same.
        So, if the names are not the same. How would you then know that file_1 is equal to file_2?

        If you do a md5sum on all the files and create something like a hash (eg. file_1: md5sum output), then compare the md5sum. If the md5sum are the same, then the files are the same? But what the chances that two files have the same md5sum but in reality they are not the same.
Re: Assistance with file compare
by gmargo (Hermit) on Oct 28, 2009 at 19:36 UTC

    Do you expect the file names to match? Or do you want to compare every file from array1 to every file from array2?

      I was thinking about that. Yes i will need to compare every file in aray 1 with every file in array2.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://803767]
Approved by moritz
Front-paged by tye
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2022-09-25 14:24 GMT
Find Nodes?
    Voting Booth?
    I prefer my indexes to start at:

    Results (116 votes). Check out past polls.