I have a series of scripts that controls our dvd jukebox/burner. When we burn a DVD, we burn two copies of it so we have a "mirror" redundant copy. We burn them in parallel and then checksum them afterwards to make sure that theyre both okay. We do this because sometimes if the load exceeds 2.0 on the machine (in this case an ultra 10), the burn may fail. Also, bad media is not entirely unheard of. This process was working fine, and took about 2 hours to burn and 2 hours to verify on a 4x burner. However, last night, something came up that proved our process needed tuning.

This is the shell script we're using at the moment:

DVD=$1 B=`/usr/sbin/mount | grep "$DVD on` if [ "$B" = "" ] then echo "DVD is not mounted. Please mount and then try again" exit fi nohup find /dvd/$DVD -type f -exec cksum {} \; >$CHECKDIR/cksum.$DVD.d +vd &
This, like I said, has been working. The problem arose when this particular dvd contained about 11,000 files. For some reason, cksum (1) is rather slow.

I knew that perl had some features to do this with Digest::MD5 (it is used in MP3::Napster, which I use a lot). I also figured I could use File::Find to recursively traverse the directories like the find (1) command above is doing. My hope was that the implementation of the checksum in cksum was slower than the checksumming in Digest::MD5, and also that the find in File::Find was quicker than that in find (1).

So I havent benchmarked it yet, but here is the code I intend to use to replace the code we're using:

#!/usr01/aja96/perl/bin/perl use warnings; use strict; use Carp; use File::Find; use Digest::MD5 qw{ md5_hex }; use File::Slurp; my $dir = shift || '.'; my $debug = ''; my @cksums; sub wanted { my $file = $_; return if (-d $file); carp "cksumming $_ ($file)\n" if $debug; my $noodle = read_file( $file ) or croak "$file unreadable: $!\n"; my %file = ( name => $file, cksum => md5_hex( $noodle ), ); push @cksums, \%file; carp "$file checksummed\n" if $debug; } find( { wanted => \&wanted, follow_fast => 1 }, $dir ); print scalar @cksums, " checksums gleaned\n";
So, I have a couple of questions. Ideally, I'd like to be able to checksum the whole volume rather than each and every file. Is this somehow possible? I seem to remember reading somewhere that it was possible to checksum a volume at a time rather than each file. Also, how can I get more speed out of this? I need to go over 4.4gb at a time, and it gets rather slow when the file count rises.

<!- tilly need not reply. -> brother dep

--
Laziness, Impatience, Hubris, and Generosity.


In reply to Checksumming dynamically created media (code) by deprecated

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.