in reply to Gracefully exiting and restarting at a later time.

It might be better if you recorded your status after you've finished with each file. The advantage of this is that then even if your script dies unexpectedly, you can continue processing from wherever you were. For example, your script can record the list of files it has fully processed in a separate progress file, and when you re-run the script, it should read that file and skip the files that are already processed.

Once you implement this, if processing the files are idempotent, then you can simply kill the script any time, whatever it's doing. Otherwise, you may want to implement a cleaner way to shut down, like you mention in your question: eg. periodically check for existence of a file, and if it does not exist (because you've deleted it), quit the script. Even then though, it's worth to save the progress occasionally, such as by writing to the progress file after each file, to avoid having to redo all the computation if the script dies for some unexpected reason, be it unexpected input, bug in your script, power failure, memory full, or something else.

Let me point to an example that may help you. The script wgetas - download many small files by HTTP, saving to filename of your choice has two measures for continuing after an interruption. Namely, to avoid repeating successful downloads, the script does not attempt to download any file if the output file it would save to already exists locally – this works only because the script creates the output file atomically, so the output file cannot exist if the script was interrupted during the download of that same file. Further, to avoid retrying downloads that have failed in a permanent way (such as the file not existing on the remote server), if the script is invoked with the -e option, a progress file is written with the names of downloads already processed. (It's important that output to the progress file is flushed after writing each filename.)

Replies are listed 'Best First'.
Re^2: Gracefully exiting and restarting at a later time.
by Largins (Acolyte) on Dec 21, 2011 at 12:09 UTC

    Greetings

    This idea, combined with keeping the information in the database is what I shall do. I will store the filename and directory after processing, in the table and use auto-commit.

    The download portion has already been completed, so don't have to worry about that.

    I will still have to rewalk the directory tree but the restart will only have to check directory name to find out if it is in the right place. This is necessary (I'm pretty sure), to avoid bypassing unfinished nodes in the directory walk.

    Thanks one and all for your input, I have made my decision, and as always, it is a better one than when I started thanks to Perl Monks!

    Largins