Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

My Perl script makes about 40 files, then FTPs the files to a server. Right now the script takes about 30 seconds to generate the files and 30 seconds to FTP them. Is there a way to generate one file, FTP that file and start generating the second file while the first is being uploaded as opposed to generating all 40 first, then uploading all 40? This would save me about half the time. The reason I am doing this is for a pseudo-realtime update, so I would like the update faster.

Replies are listed 'Best First'.
Re: FTP in background
by pjf (Curate) on Oct 09, 2001 at 03:21 UTC
    The way which immediately springs to mind (and of course, TMTOWTDI) is to have your script fork into two processes, one of which creates the files, and the other one of which produces them.

    You'll need a way of signalling that a particular file is finished to the FTP script, so it doesn't try to upload a half-finished file. Provided that you're not dealing with an NFS filesystem, then flock can work nicely here, resulting in the consumer process efficiently blocking until it has its turn to read the file. There is a potentially rare race condition where the consumer process could lock the file before the producer process has had a chance to lock it first, so it's probably wise for the consumer to never attempt to lock an empty file. This assumes that an empty file is never valid. If an empty file is valid, then you could use file permission bits to indicate when a file is ready for uploading.

    You could use shared memory to pass files (or information about files) about, which is a very good solution, as the consumer process can just wait for a message to go ahead with the next file. You could replace shared memory with unix domain sockets here if you felt more comfortable with them.

    If fork() gives you the willies, then you could just split your script into two and make sure that one is always started with the other (a small shell script can help here). This means you can't do shared memory, but if you're not comfortable with fork, then the chances are you're not comfortable with shared memory either.

    Hope that this helps you get started.

    Cheers
    Paul

Re: FTP in background
by princepawn (Parson) on Oct 09, 2001 at 02:47 UTC
    I'm not a big fan of POE (cuz it's so big and the mailing list is so quiet) but I think POE::FTP will do that.

    No one has coded a Net::FTP::ParallelUserAgent yet.

Re: FTP in background
by cLive ;-) (Prior) on Oct 09, 2001 at 04:49 UTC
    How about:
    • create a tmp directory
    • Fork a child to start FTP session, then scan that directory for files and upload any it finds and then unlink them - scan/sleep/scan/sleep/scan/upload etc..
    • create the files elsewhere and move to the tmp dir when created (using flock to stop the FTP process from grabbing partial files too early).
    • when last file is created, parent sets a marker somewhere that the FTP child can read to know that "no more files are coming", and then quits
    • child continues to ftp and unlink files it finds in the tmp dir. When done, it unlinks the tmp directory, closes FTP sesison and quits.

    What d'ya think?

    cLive ;-)

Re: FTP in background
by tachyon (Chancellor) on Oct 09, 2001 at 07:01 UTC

    You want to use forking. For some things you can just fork 40 kids and let each handle one file. However when you are doing network stuff you need to be a little careful with fork(). 40 kids all trying to simultaneously FTP will saturate the network and worse still the collisions will slow the overall data transfer rate down.

    Here are some links to nodes where I have posted some fork examples of increasing complexity.
    Forked off!
    Help with waitpid and forking ?
    Parallel::ForkManager vs global variables

    I would write the files and fork a kid to send it. You will get collisions if it takes longer to FTP the files than to write them using this approach. Alternatively fork off a daemon which sends the files as they and it become available. There are a number of methodologies you can use to make sure the child waits for all the files and dies when finished.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print