in reply to Generic compare script.

If you are seeking to determine whether files on host.A are identical to files on host.B, create a single process that will run separately on each host, use File::Find to locate all data files, use Digest::MD5 to create an MD5 signature (checksum) for each file, and output a list of file names and checksums.

Then write a separate script (much simpler) to compare the lists of file names and checksums from the two hosts, to report (1) files on A not on B, (2) files on B not on A, and (3) files with same name on each host but different content.

Bear in mind that the only reason you would need to write a perl script to do this is so you could do it easily on the windows machines. The standard tools on any unix box are already on hand to do it all with a couple simple command lines:

# on host.A: find /base/dir/path -type f -print0 | xargs -0 md5 > host.A.checksum.l +ist # and likewise on host.B, then put both list files in one place and tr +y: diff host.A.checksum.list host.B.checksum.list

Actually, these tools (as well as a couple different Bourne-like shells, bash and ksh) have been ported to Windows (look for Cygwin, AT&T Research Labs "UWIN", maybe others), so you could do shell commands like the ones above on all your machines.

In case diff makes the list comparison a bit too opaque for you, I posted a handy list-compare utility that might help with the last step, and posted it here: cmpcol.

(update: It may be that I have misread your post. For the case of the same file name existing on two hosts, the plan I suggested will only report whether they are identical or not. If you actually want to describe the nature of differences, this plan will at least tell you which files need to be inspected in closer detail, which will save you a lot of time and trouble. Ideally, you would have just a few pairs of files that need to be fetched into a common location, and you can use "diff" on them, or whatever.)

Replies are listed 'Best First'.
Re^2: Generic compare script.
by TeraMarv (Beadle) on Oct 12, 2005 at 05:50 UTC
    Yes that is exactly what I want to do but I was trying not to spread my code around, I wanted a single script to do the lot.

    Maybe that is not going to be possible???

    I will check out the ports of the unix commands.

    Many Thanks.

      I wanted a single script to do the lot

      With the unix tools installed on all machines, you could have a single script on a single machine that does something like this:

      my $find = "find /path -type f -print0 | xargs -0 md5" for my $host ( @hostlist ) { open O ">$host.md5list" or die "$host.md5list: $!"; print O `ssh $host '$find'`; close O; } # compare lists here, if you like, or use a separate script/tool to do + that
      That assumes that you have the appropriate authentication keys for using ssh without a password to connect to each host. Other methods are possible for the connections, of course.

      (updated the script to include "xargs -0", and to run the md5 part on the remote host, where it belongs -- note the single quotes around $find in the ssh command line.)

      (another update: I should confess that I have no clue how you would actually execute a shell script on a remote windows machine... good luck with that.)

        Thanks mate.......that looks like it will do the job. It's infinately simpler than my first idea! You have saved me loads of time...cheers!
        Thanks mate, this looks like it will do the job in an infinately simpler way than i was envisiging.
        I really need to brush up on my *nix.

        Cheers :o)