in reply to POE&WWW::Mechanize or POE&POE::Component::Client::HTTP
In my experience making a data-munging operation "multi-threaded" is not the problem. The real problems with these programs are operational in nature like: making them robust in the presence of errors, making them restartable, being able to easily add or modify functionality and being able to test parts of the process in isolation.
The paradigm I've used over and over again is something I'll just call the "work pool" approach. Basically you have a "database" that contains a list of all the tasks that need to be performed. Then you have worker processes which acquire tasks, perform the task and then mark the task being completed. As part of the performing a task a worker process can add additional tasks to the database (or "work pool".)
This, of course, is not a new idea. Programs like sendmail operate in this fashion using the file system for the work pool. You can also use a real database (like mysql) or a persistent hash implementation like GDBM or even a commercial offering like mqseries.
The advantage of structuring your application like this are numerous. For starters you can make your application multi-threaded by using ordinary processes. Starting and stopping your application is now possible since the state of your application is persistently stored. Also if you, say, make each worker process task specific, it is very easy to control which tasks get performed. Tasks which can't be performed due to resource unavailability errors (e.g. remote web site not available, not enough local disk space, etc.) can just be put back in the work pool to be executed later.
In the past I've just rolled my own work pool implementation from scratch. Sometimes I used the file system, other times I used a database. This is such a useful and common pattern that it would be very helpful to have a framework (specifically a perl framework) for implementing work pools.
Hope this helps.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: POE&WWW::Mechanize or POE&POE::Component::Client::HTTP
by jasonk (Parson) on Dec 25, 2007 at 03:18 UTC | |
by lestrrat (Deacon) on Dec 25, 2007 at 10:22 UTC |