perlquestion
deprecated
I have a series of scripts that controls our dvd jukebox/burner. When we burn a DVD, we burn two copies of it so we have a "mirror" redundant copy. We burn them in parallel and then checksum them afterwards to make sure that theyre both okay. We do this because sometimes if the load exceeds 2.0 on the machine (in this case an ultra 10), the burn may fail. Also, bad media is not entirely unheard of. This process was working fine, and took about 2 hours to burn and 2 hours to verify on a 4x burner. However, last night, something came up that proved our process needed tuning.
<readmore>
<p>
This is the shell script we're using at the moment:
<code>
DVD=$1
B=`/usr/sbin/mount | grep "$DVD on`
if [ "$B" = "" ]
then
echo "DVD is not mounted. Please mount and then try again"
exit
fi
nohup find /dvd/$DVD -type f -exec cksum {} \; >$CHECKDIR/cksum.$DVD.dvd &
</code>
This, like I said, has been working. The problem arose when this particular dvd contained about 11,000 files. For some reason, cksum (1) is rather slow.
<p>
I knew that perl had some features to do this with [cpan://Digest::MD5] (it is used in [cpan://MP3::Napster], which I use a lot). I also figured I could use [cpan://File::Find] to recursively traverse the directories like the find (1) command above is doing. My hope was that the implementation of the checksum in cksum was slower than the checksumming in [cpan://Digest::MD5], and also that the find in [cpan://File::Find] was quicker than that in find (1).
<p>
So I havent benchmarked it yet, but here is the code I intend to use to replace the code we're using:
<code>
#!/usr01/aja96/perl/bin/perl
use warnings;
use strict;
use Carp;
use File::Find;
use Digest::MD5 qw{ md5_hex };
use File::Slurp;
my $dir = shift || '.';
my $debug = '';
my @cksums;
sub wanted {
my $file = $_;
return if (-d $file);
carp "cksumming $_ ($file)\n" if $debug;
my $noodle = read_file( $file )
or croak "$file unreadable: $!\n";
my %file = (
name => $file,
cksum => md5_hex( $noodle ),
);
push @cksums, \%file;
carp "$file checksummed\n" if $debug;
}
find( { wanted => \&wanted, follow_fast => 1 }, $dir );
print scalar @cksums, " checksums gleaned\n";
</code>
So, I have a couple of questions. Ideally, I'd like to be able to checksum the whole volume rather than each and every file. Is this somehow possible? I seem to remember reading somewhere that it was possible to checksum a volume at a time rather than each file. Also, how can I get more speed out of this? I need to go over 4.4gb at a time, and it gets rather slow when the file count rises.
<p>
<!- tilly need not reply. ->
brother [deprecated|dep]
<p>--
<br>Laziness, Impatience, Hubris, and <i>Generosity</i>.