Re: fixed set of forked processes

I would approach this sort of problem by defining a fixed and configurable (small) number of threads, all of which are built to do the same thing: to read a work-request from a single queue (e.g. Thread::Queue::Duplex), perform the unit of work (in an eval{} block), and write a response-record to the same or to a different queue.

All of the threads, no matter how many there are, are reading and writing from the same queues. So, when a record is written to “the request queue,” no one really cares which thread winds up picking-up the request and running it.

The threads, in turn, are built to survive. Any runtime error that may occur during processing is absorbed, and a record of that event is merely added to the response-record for someone else down the line to deal with.

To avoid too-much competition for the “single file,” you might dedicate one thread to the task of reading a block of records from the file and shoving them into the request queue. By some appropriate means, let the thread snooze until the number of enqueued items drops below some threshhold, at which time it reads a few more records from the file to recharge the queues.

In this way, the jobs are indeed “processed in parallel,” but you maintain control over the attempted multiprogramming-level at all times. Such a system could perform work at a predictable and steady rate no matter how many jobs ultimately needed to be run. The size of that file would not affect the rate at which work was carried out; only the amount of wall-time required to do it.

Replies are listed 'Best First'.
Re^2: fixed set of forked processes by BrowserUk (Patriarch) on Dec 02, 2010 at 20:13 UTC
Please don't suggest the use of Thread::Queue::Duplex until you've used it, and therefore encountered its limitations. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: fixed set of forked processes by anonymized user 468275 (Curate) on Dec 02, 2010 at 18:33 UTC
hmm although I prefer fork to thread so as to surivive later versions of perl, this does give me an idea of how to implement my own threads using fork in a way that overcomes my filehandle problem with the standard drop-in solution:- Since I know in advance I am going to use the max configured subprocesses given that there are 150000 jobs in the queue being rapidly thrown at my scheduling architecture, I could start by forking precisely that number of subprocesses using open \|- and let the children live to the very end, sending the code they have to manage over the pipe. update: but then whether I do that or use your queued thread approach, I need also to read back from the child in order to perform complicated load-balancing. If the subprocesses are allowed to die per iteration (i.e per job parsed and submitted to a child or thread) I wouldn't have that problem One world, one people	[reply]