chunlou has asked for the wisdom of the Perl Monks concerning the following question:

Hello. Justing wondering, how would you implement your own commit-rollback mechanism (akin to that in databases) for moving files from one place to another?

Since moving a bunch of files could take several minutes, during which a disaster might strike and you don't want the files ending up scattering randomly in two places, a commit-rollback mechanism would be nice. So, how to do that on a file system without resorting to sorcery, prayers, databases or application servers?

Are there generalized approaches or theories (as opposed to criteria) as to how to do transaction in general?

Thanks.
  • Comment on Move files w/ commit-rollback mechanism?

Replies are listed 'Best First'.
Re: Move files w/ commit-rollback mechanism? (journaled move)
by tye (Sage) on Jun 11, 2003 at 22:32 UTC

    Something like this?

    1. Create a transaction ID, $id. Just use $$ and check for collisions (or hope for no collisions).
    2. Write a transaction journal file (probably just use a good temp-file module) that contains source and destination pairs (and whether it is a move or copy, if you want to support both).
    3. Copy all of the files to the destination file system with names something like ".$id.$desiredFinalName", checking for collisions so you can abort the "transaction".
    4. Wait for a commit request
    5. For each file that was moved:
      1. Append a line to the transaction journal saying you are about to rename the file
      2. Rename the source file to ".$id.$origFileName"
      3. Append a line to the journal saying you have finished renaming the file
    6. For each destination file:
      1. Append a line to the journal saying you are about to rename the file
      2. Rename the file to $desiredFinalName
      3. Append a line to the journal saying you finished renaming
    7. Remove the journal for the successfully committed transaction
    if something fails then you use the journal to rollback.

                    - tye
Re: Move files w/ commit-rollback mechanism?
by perrin (Chancellor) on Jun 12, 2003 at 01:30 UTC

      I'm assuming that since the original node said that "moving a bunch of files could take several minutes", that we aren't simply talking about "renaming" a bunch of files within a single file system. Given that, those modules don't support "moving" of files as part of the transaction. You could do all of the copying to temporaries first and then use addfile() to do the renaming step as a transaction, so it might save quite a bit of coding (I only took a quick look).

      They also don't support a rollback (from a persistant journal) if the process actually dies in the middle ("a disaster might strike").

      Or it might be fun to update the modules to support more forms of "moving" files and to add journaling.

                      - tye
Re: Move files w/ commit-rollback mechanism?
by bluto (Curate) on Jun 11, 2003 at 22:45 UTC
    If your source list of files is read-only until a "commit" and only a single process is moving things around, you may be able to do this rather cheesily. For example, if you have an existing directory with files in it, and it isn't being modified, you can mirror it to a new directory, alter the files in the new directory, and then use a symbolic link (in Unix) to switch directories pretty close to instantaneously.

    Other than that you or something your code invokes will have to keep a journal of every change made (as tye suggests) or if you feel lucky use something really cheesy like a copy-on-write filesystem.

    bluto

Re: Move files w/ commit-rollback mechanism?
by aquarium (Curate) on Jun 12, 2003 at 02:15 UTC
    copy the files instead of moving them. after copying compare file sizes to originals. if they match then delete local copies. if they don't match, re-try. be careful along the way if the destination is a different file system (e.g. it can become unavailable due to network failure, media failure etc.)...also file system full condition. Should just gracefully abort if the checking process fails...not much you can do if network goes down half-way through copying (just run it again later) EXCEPT if you know the destination system type you could waste lots of time and write your own self contained file installation script + files and copy this first to dest system before executing it (rsh?). mind you there are (non-perl) tools to do mirroring with commit/rollback already built in $$$
      A better option may be an MD5 comparison of the contents of the files. If you're going to go through the trouble of creating a rollback mechanism for copying, you should at least verify that the files are exact matches (well, MD5 has a slight chance of creating the same hash for slightly differing inputs (i.e. the hard drive misreads and puts "I am a lonely man" instead of "I like to dress in women's clothing" and you get the same MD5 for both lines...which you don't, I checked), but the chances are somewhere in the realm of winning the lottery every drawing during a year and then dying from the combination of a shark attack and simultaneously getting struck by lightning, all while in Kansas...driving on the highway...in the middle of winter.... Actually, on second thought your chances may be worse than that...).

      antirice    
      The first rule of Perl club is - use Perl
      The
      ith rule of Perl club is - follow rule i - 1 for i > 1

Re: Move files w/ commit-rollback mechanism?
by mattr (Curate) on Jun 12, 2003 at 13:35 UTC
    I built a file upload system called Filexer which does something equivalent this for updating a web server. It is not a journaling system, though it does keep a log like tye mentioned. Not a transaction system but anyway I'll describe our experience with it.

    There are a staging server and a live server with various virtual hosts accessible by user permission. The program checks modification dates (md5sums were removed since a converter is run to do a global replace on embedded hyperlinks during conversion). It runs on the live server The other server is accessible as a mounted drive over a datacenter LAN.

    The requirement was for both sides to be scanned and only files which are older or nonexistent on the live server are candidates for uploading, likewise only those existing on the live server and not the staging server are candidates for deletion. In fact the ability to roll back was not required.

    Some fine points are that it makes subdirectories and ignores protected files, and uses a cygwin version of touch which I discovered can set modification times of uploaded files on the live server in Windows. Selected ranges of access logs are downloaded to a place with ftp permissions and email is sent to the user too. Providing a directory listing was also required.

    Anyway if you run the system on the live server and copy everything first to another folder, then replace files by renaming them, the operation should be pretty quick. Faster would be to make symbolic links on your live server for major folders and then switch the one you want with its updated copy instantaneously. Otherwise you may have people accesing a partially updated folder briefly, I didn't find it a big problem since we only handled a very small number of files to be uploaded at once.

    I don't know what your purpose is but for us, we had to allow non-technical people to do this work on off-hours. It is far better than depending on some providers' upload request forms. It may sound like some pretty easy stuff but sweating the small points (like NT permissions) can keep you awake for a while. Good luck with your project, it sounds useful.

Re: Move files w/ commit-rollback mechanism?
by Anonymous Monk on Jun 12, 2003 at 00:01 UTC
    I recall that there something called rdist that might do what you are looking for.
Re: Move files w/ commit-rollback mechanism?
by t'mo (Pilgrim) on Jun 11, 2003 at 23:16 UTC
Re: Move files w/ commit-rollback mechanism?
by zakzebrowski (Curate) on Jun 12, 2003 at 11:26 UTC
    Well, why not use a database on the system your sending it from? That way, you can have a store of file names, then when you commit, or rollback, you can have md5 fields and date fields and if you made it or didn't you can re-upload or re-download the appropriate versions of the files... (assuming you have the other orignal files some where...)

    ----
    Zak
    Pluralitas non est ponenda sine neccesitate - mysql's philosphy