Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I have a series of scripts that controls our dvd jukebox/burner. When we burn a DVD, we burn two copies of it so we have a "mirror" redundant copy. We burn them in parallel and then checksum them afterwards to make sure that theyre both okay. We do this because sometimes if the load exceeds 2.0 on the machine (in this case an ultra 10), the burn may fail. Also, bad media is not entirely unheard of. This process was working fine, and took about 2 hours to burn and 2 hours to verify on a 4x burner. However, last night, something came up that proved our process needed tuning.

This is the shell script we're using at the moment:

DVD=$1 B=`/usr/sbin/mount | grep "$DVD on` if [ "$B" = "" ] then echo "DVD is not mounted. Please mount and then try again" exit fi nohup find /dvd/$DVD -type f -exec cksum {} \; >$CHECKDIR/cksum.$DVD.d +vd &
This, like I said, has been working. The problem arose when this particular dvd contained about 11,000 files. For some reason, cksum (1) is rather slow.

I knew that perl had some features to do this with Digest::MD5 (it is used in MP3::Napster, which I use a lot). I also figured I could use File::Find to recursively traverse the directories like the find (1) command above is doing. My hope was that the implementation of the checksum in cksum was slower than the checksumming in Digest::MD5, and also that the find in File::Find was quicker than that in find (1).

So I havent benchmarked it yet, but here is the code I intend to use to replace the code we're using:

#!/usr01/aja96/perl/bin/perl use warnings; use strict; use Carp; use File::Find; use Digest::MD5 qw{ md5_hex }; use File::Slurp; my $dir = shift || '.'; my $debug = ''; my @cksums; sub wanted { my $file = $_; return if (-d $file); carp "cksumming $_ ($file)\n" if $debug; my $noodle = read_file( $file ) or croak "$file unreadable: $!\n"; my %file = ( name => $file, cksum => md5_hex( $noodle ), ); push @cksums, \%file; carp "$file checksummed\n" if $debug; } find( { wanted => \&wanted, follow_fast => 1 }, $dir ); print scalar @cksums, " checksums gleaned\n";
So, I have a couple of questions. Ideally, I'd like to be able to checksum the whole volume rather than each and every file. Is this somehow possible? I seem to remember reading somewhere that it was possible to checksum a volume at a time rather than each file. Also, how can I get more speed out of this? I need to go over 4.4gb at a time, and it gets rather slow when the file count rises.

<!- tilly need not reply. -> brother dep

--
Laziness, Impatience, Hubris, and Generosity.


In reply to Checksumming dynamically created media (code) by deprecated

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (None)
    As of 2024-04-25 01:05 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found