weismat has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am using an external ftp server to post data on a web site.
Unfortunately I have only one simultanous login on this ftp server, but multiple scripts which write files to be transfered.
Fortunately all scripts are using hthe same custom module for transfer. I have now implemented a locking mechanism based on a file existence to ensure that only one process a file at a time.
This works ok, but is a bit error prone if a script hangs or terminates the wrong time.
What alternative approaches would you suggest?
  • Comment on Synchronisation between multiple scripts

Replies are listed 'Best First'.
Re: Synchronisation between multiple scripts
by BrowserUk (Patriarch) on Jan 16, 2009 at 08:23 UTC

    Create a daemon to handle the connections and transfers, that opens a local socket and accepts the paths of files to be transferred.

    Have it copy or move the files into a private directory when the transfer request is made and delete once it is complete. If the deamon dies, it know what needs to be done when it re-starts.

    It can either maintain an open connection to the remote host, or only establish the connection when there are transfers to be done.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I was also thinking about this approach, but I was a bit reluctant to implement it because it was a lot more work than using the lock file.
      If the transfer becomes a bottleneck, I will think about implementing it again.
      At the moment I am busy reducing my watch cycles for the successful transfers, but if I think it is too slow, then I will go for this approach.
        I would go for BrowserUK's approach, and I fail to see how it's going to be a lot more work. In fact, it's probably going to be a lot less work. With a separate program doing the FTP transfers, you only have one place where you have to worry about failed transfers, instead of having to deal with it in all your programs.

        Besides, it follows the Unix toolkit approach: separate things are done by separate programs - each program tuned to do its task very well.

Re: Synchronisation between multiple scripts
by andreas1234567 (Vicar) on Jan 16, 2009 at 07:33 UTC
    Reading Abort if instance already running? is probably helpful.
    --
    No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]
      This is a very helpful reference<.
      I might start to use Proc::PID::File with my server name instead of default $0 instead my own custom implementation.
Re: Synchronisation between multiple scripts
by eye (Chaplain) on Jan 16, 2009 at 08:34 UTC
    The usual unix/linux approach is locking based on a PID file with the refinements described in the thread referenced by andreas1234567. These refinements generally deal with issues of scripts that hang or terminate prematurely. The situation you describe seems to have greater potential for creating a race condition because there are multiple scripts in the picture rather than a single script with multiple potential instances.

    One approach would be to create a script that transfers files to the ftp server when they appear in a watched directory. This script could be run as a daemon or through cron depending on how quickly you need it to act. This script would need to assure that only one instance was active at a time, but the techniques described in the aforementioned node should work well for this purpose. Your existing scripts would need to be modified to move files to the watched directory rather than sending them to the remote server.

    On linux, another approach would be to use the fuse (filesystem in userspace) kernel module with curlftpfs (LftpFS may be a reasonable alternative; avoid FuseFTP as it is not very robust) to mount the remote server as a local file system (I believe this can be done through fstab). This also lets a single entity manage the connection to the remote ftp server. The existing scripts would need to be modified to write files to the appropriate local path (functionally sending them to the remote server via ftp). I believe that this approach will allow the ftp connection to timeout while not in use and will only reconnect when needed; it would be best to verify that this is correct.

      Does your advisary locking of a pid file scheme, handle recovery after a system crash?

      Polling directories is a silly way to do things.

      1. How long do you wait after the filename appears, before you decide that the application writing that file has finished doing so?
      2. How do you detect if the process writing the file has hung or crashed part way through writing it?

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Does your advisary locking of a pid file scheme, handle recovery after a system crash?
        Yes.

        If you review the mentioned thread, the first node by shmem details how the script should verify that the listed PID is actually running the proper script.

        Polling directories is a silly way to do things.
        Silliness is relative. If the files only need to be pushed to the server every hour, polling doesn't seem so bad.
        How long do you wait after the filename appears, before you decide that the application writing that file has finished doing so?
        How do you detect if the process writing the file has hung or crashed part way through writing it?
        In the first approach, I specifically said "...to move files to the watched directory...." This is an important point; perhaps I should have explicitly stated that files should be moved to the directory rather than written to the directory.

        Update: Corrected "rather than copied" to "rather than written" in the last sentence.

        Update 2: Corrected HTML formatting of whitespace before update 1.

Re: Synchronisation between multiple scripts
by Bloodnok (Vicar) on Jan 16, 2009 at 10:49 UTC
    Unless I've (not for the first time) missed the point, this sounds like a classic case of a class following the singleton/monadic pattern to handle the ftp protocol whilst providing a common interface to the writer scripts (and also, implicitly, providing the locking).

    A user level that continues to overstate my experience :-))