1: # This my first posting in this section so be kind (well, unless it's for my own good ;-) ).
   2: 
   3: # The program was intended to synchronise MP3 files on my hard disk with
   4: # those on my portable MP3 jukebox thingy
   5: # (<a href="http://www.archos.com">Archos Jukebox 6000</a>).
   6: #  I rip the CDs onto my local hard disk then copy them across
   7: # to the player (which appears as a USB removable hard disk).
   8: 
   9: # I just recently noticed the a module which does a lot of the same things,
  10: # as well as lots of other things.  That makes this program yet another
  11: # re-invented wheel but I enjoyed writing it. (NOTE: I can't
  12: # remember what the module was called though!)
  13: 
  14: #!perl -w
  15: # MP3sync - synchronises MP3 files/directories between
  16: #           two directories
  17: 
  18: # Usage:
  19: #   MP3sync [configfile]
  20: # Where:
  21: #   configfile is a configuration file (as described below),
  22: #   the program defaults to reading mp3sync.cfg
  23: 
  24: # Configuration file format:
  25: #   Each line begins with a setting name, followed by '=' and
  26: #   the value to be assigned to the setting.  Case is significant.
  27: #
  28: # Settings: [default in square brackets]
  29: #   Source - directory with (possibly new) MP3s
  30: #     [C:/My Documents/My Music/]
  31: #   Dest - directory where MP3s should be copied (if necessary)
  32: #     [G:/]
  33: # (each of these specifies a path which should be the root directory
  34: #  of the tree containing the files and directories to be synchronised)
  35: 
  36: # important stuff
  37: use strict;
  38: use diagnostics;
  39: 
  40: # we want to calculate checksums for files
  41: use Digest::MD5;
  42: # for reading directories
  43: use IO::Dir;
  44: # for copying files
  45: use File::Copy;
  46: 
  47: # Default configuration file name
  48: my $configFile = "mp3sync.cfg";
  49: 
  50: # Hash to hold configuration settings
  51: my %config = (
  52:   Source => 'C:/My Document/My music/',
  53:   Dest => 'G:/'
  54: );
  55: 
  56: # Sub: ReadConfig - reads in a configuration file
  57: # Usage:
  58: #   $status = ReadConfig($configFile, \%configHash);
  59: # Where:
  60: #   $status will be true on successfull reading of the file.
  61: #   (error messages will be output to STDERR)
  62: #   $configFile is the name of the file to read
  63: #   \%ConfigHash is a reference to a hash holding configuration
  64: #     settings
  65: sub ReadConfig {
  66:   my $filename = shift;
  67:   my $configHash = shift;
  68:   # check the parameters - filename first
  69:   if (!defined($filename)) {
  70:     warn("You must supply a filename to ReadConfig");
  71:     return undef;
  72:   };
  73:   # has the user specified a hash?
  74:   if (!defined($configHash)) {
  75:     warn("You must supply a configuration hash reference to ReadConfig");
  76:     return undef;
  77:   };
  78:   # is it really a hash ref?
  79:   if (ref($configHash) ne "HASH") {
  80:     warn("The second parameter to ReadConfig must be a hash reference");
  81:     return undef;
  82:   };
  83:   # open the file - warn that we're using the defaults if it can't be opened
  84:   unless (open(CONFFILE, '<', $filename)) {
  85:     warn("Couldn't open config file $filename: $!\n");
  86:     warn("Using defaults");
  87:     return 1;
  88:   };
  89:   # now read each line
  90:   my $line;
  91:   while (defined($line = <CONFFILE>)) {
  92:     # if it's a valid line store it in the hash
  93:     if ($line =~ /(\w+)\=(.+)/) {
  94:       $configHash->{$1} = $2;
  95:     };
  96:     # ignore invalid lines silently
  97:   };
  98:   # close the file
  99:   unless (close(CONFFILE)) {
 100:     WARN("Couldn't close config file $filename: $!\n");
 101:     return undef;
 102:   };
 103:   # success!
 104:   return 1;
 105: };
 106: 
 107: # Sub: PopulateTree
 108: # Usage:
 109: #   $status = PopulateTree($path, \%hash);
 110: # Where:
 111: #   $status will be true if the function succeeded
 112: #   $path is the directory to examine (terminate in '/')
 113: #   \%hash is a reference to a hash to be filled with directory
 114: #     tree information.  Each item in the hash will be a hash
 115: #     containing the following keys:
 116: #       Type (files and dirs) - 'File' or 'Dir'
 117: #       Contents (dirs only)  - an array containing the directory
 118: #                               contents, in this form
 119: #       Digest (files only)   - the MD5 digest of the file
 120: sub PopulateTree {
 121:   # get our parameters
 122:   my $path = shift;
 123:   if (!defined($path)) {
 124:     warn("PopulateTree needs a path");
 125:     return undef;
 126:   };
 127:   my $hashref = shift;
 128:   if (!defined($hashref)) {
 129:     warn("PopulateTree needs a hash reference");
 130:     return undef;
 131:   };
 132:   # check that $hashref really is a hash reference
 133:   if (ref($hashref) ne 'HASH') {
 134:     warn("PopulateTree needs a HASH reference");
 135:     print ref($hashref), "\n";
 136:     return undef;
 137:   };
 138:   
 139:   print "Looking in $path...\n";
 140: 
 141:   # create a Digest::MD5 object
 142:   my $digest = new Digest::MD5;
 143:   
 144:   # read the directory
 145:   my $dir = new IO::Dir;
 146:   my @dirContents;
 147:   unless ($dir->open($path)) {
 148:     warn("Couldn't open directory $path: $!\n");
 149:     return undef;
 150:   };
 151:   unless (@dirContents = $dir->read()) {
 152:     warn("Couldn't read directory $path: $!\n");
 153:     return undef;
 154:   };
 155:   # now look at each item and decide what to do with it
 156:   ITEM:
 157:   foreach my $item (@dirContents) {
 158:     if (($item eq '.') || ($item eq '..')) {
 159:       next ITEM;
 160:     };
 161:     if (-d $path.$item) {
 162:       # it's a directory, create an item entry for it
 163:       $hashref->{$item} = {
 164:         Type => 'Dir',
 165:         Contents => {}
 166:       };
 167:       # get its contents
 168:       unless (PopulateTree($path.$item."/", $hashref->{$item}->{Contents})) {
 169:         warn("PopulateTree failed");
 170:         return undef;
 171:       };
 172:     } elsif (-f _) {
 173:       # it's an ordinary file, pass it to Digest::MD5
 174:       unless (open(*FILE, '<', $path.$item)) {
 175:         warn("Couldn't open $path$item: $!\n");
 176:         return undef;
 177:       };
 178:       $digest->addfile(*FILE);
 179:       my $fileDigest = $digest->hexdigest();
 180:       unless (close(*FILE)) {
 181:         warn("Couldn't close $path$item: $!\n");
 182:         return undef;
 183:       };
 184:       # create an item entry for it
 185:       $hashref->{$item} = {
 186:         Type => 'File',
 187:         Digest => $fileDigest
 188:       };
 189:     };
 190:     # ignore other item types
 191:   };
 192:    # success
 193:    return 1;
 194: };
 195: 
 196: 
 197: # Sub: CompareTrees
 198: # Usage:
 199: #   ($status, @doList) =
 200: #     CompareTrees($sourcePath,$destPath,\%sourceTree, \%destTree);
 201: # Where:
 202: #   $status will be true if the function succeeded.
 203: #   @doList will contain a list of things to do to make the two trees
 204: #     identical, or be undefined on failure.  Each item in the list will be
 205: #     a hash containing two keys: 'Action' and 'Detail'.  'Action' will be
 206: #     either 'Copy' or 'Create', 'Detail' will be the full path to create (for
 207: #     'Create') or a two item array (for 'Copy').  The first item is the source
 208: #     file, the second is the file to copy it to both have full paths.
 209: #   $sourcePath is the full path to the source tree root
 210: #   $destPath is the full path to the destination tree root
 211: #   \%sourceTree is a reference to a tree as returned by PopulateTree.
 212: #   \%destTree is a reference to a tree as returned by PopulateTree.
 213: # Note:
 214: #   The destTree hash is modified by this function, keep a copy of it if
 215: #   you want the original data.
 216: sub CompareTrees {
 217:   # get the paths
 218:   my $sourcePath = shift;
 219:   unless ($sourcePath) {
 220:     warn("You must provide a source path to CompareTrees");
 221:     return undef;
 222:   };
 223:   my $destPath = shift;
 224:   unless ($destPath) {
 225:     warn("You must provide a destination path to CompareTrees");
 226:     return undef;
 227:   };
 228:   # get our two tree refs
 229:   my $sourceRef = shift;
 230:   unless ($sourceRef) {
 231:     warn("You must provide a source tree reference to CompareTrees");
 232:     return undef;
 233:   };
 234:   my $destRef = shift;
 235:   unless ($destRef) {
 236:     warn("You must provide a destination tree reference to CompareTrees");
 237:     return undef;
 238:   };
 239:   # check that they are hash refs
 240:   if ((ref($sourceRef) ne 'HASH') || (ref($destRef) ne 'HASH')) {
 241:     warn("CompareTrees takes two hash references");
 242:     return undef;
 243:   };
 244:   
 245:   # an array to return our result in
 246:   my @retArray = ();
 247:   my $item;
 248:   # look at each item contained in the source tree
 249: CMPITEM:
 250:   foreach $item (keys(%{$sourceRef})) {
 251:     # is there a corresponding item in the destination tree?
 252:     if (defined($destRef->{$item})) {
 253:       # if it's a directory we know it exists, if it's a file we want to
 254:       # check it's identical
 255:       if ($destRef->{$item}->{Type} eq 'File') {
 256:         # compare the MD5 digests
 257:         if ($destRef->{$item}->{Digest} ne $sourceRef->{$item}->{Digest}) {
 258:           # indicate that we want to make a copy
 259:           push(@retArray, {
 260:             Action => 'Copy',
 261:             Detail => [$sourcePath.$item, $destPath]
 262:           });
 263:         };
 264:       };
 265:       # no further action is required, look at the next item
 266:       next CMPITEM;
 267:     } else {
 268:       # if it's a directory then we need to create it, if it's a file we need
 269:       # to copy it
 270:       if ($sourceRef->{$item}->{Type} eq 'File') {
 271:         # indicate that we want to make a copy
 272:         push(@retArray, {
 273:           Action => 'Copy',
 274:           Detail => [$sourcePath.$item, $destPath.$item]
 275:         });
 276:       } else {
 277:         # it's a directory, indicate that we want to create it
 278:         push(@retArray, {
 279:           Action => 'Create',
 280:           Detail => $destPath.$item
 281:         });
 282:         # create it in the destination tree
 283:         $destRef->{$item} = {
 284:           Contents => {},
 285:           Type => 'Dir'
 286:         };
 287:         # we also need to recurse down into it
 288:         my ($status, @tempArray) = CompareTrees(
 289:           $sourcePath.$item.'/',
 290:           $destPath.$item.'/',
 291:           $sourceRef->{$item}->{Contents},
 292:           $destRef->{$item}->{Contents});
 293:         unless ($status) {
 294:           warn("CompareTrees recursion failed");
 295:       	  return undef;
 296:       	};
 297:       	push(@retArray, @tempArray);
 298:       }; # item type is 'Dir'
 299:     }; # !defined($destRef->{$item})
 300:   }; # foreach
 301:   # success
 302:   return (1, @retArray);
 303: };
 304: 
 305: 
 306: # Sub: DoActions - carry out the actions to sync two trees
 307: # Usage:
 308: #   $status = DoActions(@actionArray);
 309: # Where:
 310: #   $status will be true if successful
 311: #   @actionArray is an array of actions as returned by CompareTrees
 312: sub DoActions {
 313:   # get the array
 314:   my @actions = @_;
 315:   # carry out the actions
 316:   foreach my $action (@actions) {
 317:     # what type of action is it?
 318:     if ($action->{Action} eq 'Copy') {
 319:       # it's a copy, get the source and destination
 320:       print "Copying $action->{Detail}->[0]...\n";
 321:       unless (copy($action->{Detail}->[0], $action->{Detail}->[1])) {
 322:         warn("Couldn't copy $action->{Detail}->[0] to $action->{Detail}->[1]: $!\n");
 323:         return undef;
 324:       };
 325:     } else {
 326:       # it's a create, get the directory name
 327:       print "Creating $action->{Detail}...\n";
 328:       unless(mkdir $action->{Detail}) {
 329:         warn("Couldn't create directory $action->{Detail}: $!\n");
 330:         return undef;
 331:       };
 332:     };
 333:   }; # foreach Action
 334:   return 1;
 335: };
 336: 
 337: 
 338: #
 339: # Main
 340: #
 341: 
 342: # read the config file
 343: ReadConfig($configFile, \%config) or die("Couldn't read config file $configFile\n");
 344: 
 345: # storage for the two trees
 346: my (%sourceTree, %destTree);
 347: 
 348: # read the source tree
 349: PopulateTree($config{Source}, \%sourceTree) or die("Couldn't read source tree\n");
 350: # and the destination tree
 351: PopulateTree($config{Dest}, \%destTree) or die("Couldn't read dest tree\n");
 352: 
 353: # find the differences
 354: my ($status, @actions) =
 355:   CompareTrees($config{Source}, $config{Dest}, \%sourceTree, \%destTree)
 356:     or die("Couldn't compare trees\n");
 357: 
 358: # check whether there are any actions
 359: if (scalar(@actions) == 0) {
 360:   print "Nothing to do, source and destination trees are identical.\n";
 361: };
 362: 
 363: DoActions(@actions) or die("Sync failed to create files/directories.\n");
 364: 
 365: print "Finished.";
 366: 
 367: # So what do you think?
 368: # Kevin O'Rourke.

Replies are listed 'Best First'.
Re (tilly) 1: File synchronisation (for criticism)
by tilly (Archbishop) on Jun 23, 2001 at 05:09 UTC
    I strongly recommend going with rsync instead of rolling your own code. Often the most important lesson for programmers - and the hardest - to learn is the value of researching existing wheels before setting out to build your own. Often there is no way that the home-cobbled one will match the solid ones already in mass-production, no assembly required...

      rsync doesn't seem to be available for Win32, but I get your point.

      Kevin O'Rourke

        Actually it is. You need Cygwin first, and then you can follow these instructions to get it running. Google also turned up a patch that may make life somewhat easier.
Re: File synchronisation (for criticism)
by Brovnik (Hermit) on Jun 22, 2001 at 19:11 UTC
    (Hi to a fellow Cambridge person...)
    Note that these are mostly style issues, aimed at readability.

    You are doing a lot of checking of parameters. 2 ways I use to make this clearer (and 1 other comment) are :

    1. Use prototypes, e.g.
      sub CompareTrees($$\%\%)
      Perl will check that you are getting 2 scalars and 2 hashrefs and generate an error for you if not met. If the caller provides a hash, not a ref, Perl also generates the ref for you - DWIM.
    2. Count the parameters passed in. e.g.
      sub CompareTrees { @_ == 4 || warn "CompareTrees($srcPath,$destPath,\%srcTree, \%destTree)" && return undef; my ($sourcePath,$destPath,$sourceTree, $destTree) = @_;
      I think that is clearer and quicker to read and doesn't reduce reduce the usability.
    3. You are checking for errors within CompareTrees and then returning undef, only to die in the caller if you return the undef. If you are going to die anyway, you may as well do it where you find the error.
      sub CompareTrees { @_ == 4 || die "CompareTrees($srcPath,$destPath,\%srcTree,\%destTree)"; my ($sourcePath,$destPath,$sourceTree, $destTree) = @_;

      Personally, if I am checking in one place, I don't check in the other unless I am feeling paranoid, and try to throw the exception as close to the error as possible. This just makes the code flow more easily.
    Config::IniFiles will do the Config file reading for you.
    --
    Brovnik