Here's my 2 cents -- ++ this node if you like the idea.

In my experience making a data-munging operation "multi-threaded" is not the problem. The real problems with these programs are operational in nature like: making them robust in the presence of errors, making them restartable, being able to easily add or modify functionality and being able to test parts of the process in isolation.

The paradigm I've used over and over again is something I'll just call the "work pool" approach. Basically you have a "database" that contains a list of all the tasks that need to be performed. Then you have worker processes which acquire tasks, perform the task and then mark the task being completed. As part of the performing a task a worker process can add additional tasks to the database (or "work pool".)

This, of course, is not a new idea. Programs like sendmail operate in this fashion using the file system for the work pool. You can also use a real database (like mysql) or a persistent hash implementation like GDBM or even a commercial offering like mqseries.

The advantage of structuring your application like this are numerous. For starters you can make your application multi-threaded by using ordinary processes. Starting and stopping your application is now possible since the state of your application is persistently stored. Also if you, say, make each worker process task specific, it is very easy to control which tasks get performed. Tasks which can't be performed due to resource unavailability errors (e.g. remote web site not available, not enough local disk space, etc.) can just be put back in the work pool to be executed later.

In the past I've just rolled my own work pool implementation from scratch. Sometimes I used the file system, other times I used a database. This is such a useful and common pattern that it would be very helpful to have a framework (specifically a perl framework) for implementing work pools.

Hope this helps.


In reply to Re: POE&WWW::Mechanize or POE&POE::Component::Client::HTTP by pc88mxer
in thread POE&WWW::Mechanize or POE&POE::Component::Client::HTTP by spx2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.