in reply to Proportional distribution of indivisible items

Have you considered processing the files in parallel using a demand-based algorithm rather than trying to predict their runtime? For example, have X processes, and each time one of the processes finishes, it can simply consume one of the remaining files?

I've had success with it in the past. I generally do it something like:

while (1) { # Are we supposed to pause or stop? if (-E "/tmp/pause") { sleep 10; next; } last if -E "/tmp/STOP"; # Find a file to process my @filist = glob("*.input"); while (my $file = shift @filist) { # try to claim the file if (rename $file, "$file.working.$$") { # Successful, so go handle it process_file("$file.working.$$"); last; } # Didn't get the file (someone else might've claimed it just # as we tried to), so just loop to the next one... } }

The rename is essentially to let the OS handle the task of serializing the processes: on all operating systems I generally use, renaming (on a local filesystem) is considered to be an "atomic" operation, so only one process would "win" the file, and other processes would have to go and try the next one.

The advantages of demand-based sharing are:

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: Proportional distribution of indivisible items
by anonymized user 468275 (Curate) on Aug 15, 2018 at 15:38 UTC
    It seems an interesting idea to accept them from a queue like 5 government workers pressing the button for the next ticket number held by "customers". It would certainly mean the system is sensitive to processes completing earlier than predicted. But in this context, the first available worker at any time cannot always optimally take the largest process in the queue. For example, in the simplest case of two workers with three items left in the queue, when one becomes available 3 minutes ahead of the other and the remaining process times are estmated at 8, 5 and 4 minutes; taking the 8 minute one would leave the other worker with 12 minutes (update 3 to be free plus the 5 and 4 remaining) to your 8 whereas taking the 5 minute one would leave him 11 (3+8) minutes to your 9 (5+4). So yes, I see benefit in dynamic selection although it doesn't actually remove the problem of optimal picking (based on averaged runtimes and best fit) from what is remaining to be processed, but it does add another independent optimisation to the best solution.

    One world, one people