When faced with anything that looks like a pipeline process, I like to split up the parts into many small parts, and keep each one as simple as possible.

It seems to me that you have three steps:

  1. Procure the incoming data
  2. Post the data on the external forum
  3. Put the data into the processing queue for the next process

As long as performance permits it, I would use only one process for one step, as multiple processes will give you the headaches of concurrency.

If you have a scheme of proper file locking (as it is easily available under Win32, and not-so-easily-but already demonstrated here under Unixish filesystems), you can use a separate process for each step, which makes restarting certain items much easier. Then, each item becomes a file which is moved from directory to directory as it progresses through the stages of your pipeline. Status queries are then reduced to the task of finding where a file resides in the directory tree and to checks that no file should be older than (say) 5 minutes in any of the intermittent directories.

If you have no way of proper locking through files, a database supplies you with easy concurrency and proper locking. Put your data in a table row, together with a status column and all processes can even easier manipulate the data. I would still restrict input to one process to avoid feeding duplicates, but if you construct your SQL and the status properly, you can have as many processing items as you wish/your system allows. Status queries are then simple SQL, but taking an item out of the processing pipeline requires setting the status instead of moving the file - this may or may not be a problem for bulk changes, depending on how much access you have to the live system.


In reply to Re: Managing a web form submission work queue by Corion
in thread Managing a web form submission work queue by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.