Have you considered processing the files in parallel using a demand-based algorithm rather than trying to predict their runtime? For example, have X processes, and each time one of the processes finishes, it can simply consume one of the remaining files?
I've had success with it in the past. I generally do it something like:
while (1) {
# Are we supposed to pause or stop?
if (-E "/tmp/pause") {
sleep 10;
next;
}
last if -E "/tmp/STOP";
# Find a file to process
my @filist = glob("*.input");
while (my $file = shift @filist) {
# try to claim the file
if (rename $file, "$file.working.$$") {
# Successful, so go handle it
process_file("$file.working.$$");
last;
}
# Didn't get the file (someone else might've claimed it just
# as we tried to), so just loop to the next one...
}
}
The rename is essentially to let the OS handle the task of serializing the processes: on all operating systems I generally use, renaming (on a local filesystem) is considered to be an "atomic" operation, so only one process would "win" the file, and other processes would have to go and try the next one.
The advantages of demand-based sharing are:
- Bad runtime predictions won't cause task starvation
- It balances itself automatically, so it's easy to add or remove processes based on system load.
...roboticus
When your only tool is a hammer, all problems look like your thumb. |