in reply to Storing program settings and state

First of all, when copying / verifying files, all the persistent data you need are the files themselves - you verify each file, and if it is invalid or does not exist, you (re) download it. rsync is a very good program that does exactly that, so this wheel has already been invented :-) I don't know if you, as a mere user, can install and run it or if it needs to be run as root though...

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Replies are listed 'Best First'.
Re: Re: Storing program settings and state
by John M. Dlugosz (Monsignor) on Jul 09, 2003 at 18:20 UTC
    No, my situation is not being able to copy a whole file. Suppose you have a 7Gb file and a link that goes down after 20 seconds. So, I want to treat it like a bunch of small files (configurable chunk size). I also intent to make it portable, not just Windows only (obviously Unix-only is not of interest to me).

    To copy a bunch of small files, I would just use the command line copy source dest /u and it would continue where it left off, whole-file wise. The resource kit program robocopy does something similer: it keeps trying until it works. Verifying is the same issue, since it requires reading the source again.

    My niche is different: the files are so large (relative to the link's reliability) that it would never be able to copy a whole file.

      Actually rsync is intended for exactly that. Not only does it copy the file in chunks, when the file on the other end changes and you need to recopy it, it only copies the chunks that have changed. From the rsync features page:
      rsync uses the "rsync algorithm" which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand.
      Suppose you have a 7Gb file and a link that goes down after 20 seconds.

      I'd look into getting the link fixed.

      obviously Unix-only is not of interest to me

      I'm sure rsync will run on Windows, at least in a cygwin environment. (I wouldn't be surprised if there were a native port.)

      Finally, I agree with Corion's assessment that all of the needed information should be in the files themselves. There should be no reason to write metadata elsewhere.

      -sauoq
      "My two cents aren't worth a dime.";
      
        After starting into the rsync paper, I see why y'all are not seeing my problem. This is not two peers that can compare notes. It is ONE computer and a remote storage system. Reading the file to compare it is just as problematic as copying it!

        In particular, I'm trying to get files off a Maxstore FireWire enclosure. It transfers a while and then has to have power cycled.

      If you don't end up simply using rsync as suggested (which I highly recommend. rsync is very good), you should at least read about the rsync diff algorithm. It's fairly simple (I even made a test implementation in Perl, long long ago), and quite effective. They put a lot of thought into solving the problem of efficiently transferring large files over a flaky link. The solution they came up with is really nice. If you're not going to use their program, you might as well use their algorithm. :)

      Update: Here's a link to a paper describing the rsync algorithm, for your reference. This is just one copy of many... a quick Google search will show you more.

      Update 2: Looking around at the rsync docs, I stumbled upon Andrew Tridgell's PhD Thesis. Chapters 3, 4, and 5 discuss rsync in great detail. Very interesting reading, if I do say so myself. :)