When faced with anything that looks like a pipeline process, I like to split up the parts into many small parts, and keep each one as simple as possible.
It seems to me that you have three steps:
As long as performance permits it, I would use only one process for one step, as multiple processes will give you the headaches of concurrency.
If you have a scheme of proper file locking (as it is easily available under Win32, and not-so-easily-but already demonstrated here under Unixish filesystems), you can use a separate process for each step, which makes restarting certain items much easier. Then, each item becomes a file which is moved from directory to directory as it progresses through the stages of your pipeline. Status queries are then reduced to the task of finding where a file resides in the directory tree and to checks that no file should be older than (say) 5 minutes in any of the intermittent directories.
If you have no way of proper locking through files, a database supplies you with easy concurrency and proper locking. Put your data in a table row, together with a status column and all processes can even easier manipulate the data. I would still restrict input to one process to avoid feeding duplicates, but if you construct your SQL and the status properly, you can have as many processing items as you wish/your system allows. Status queries are then simple SQL, but taking an item out of the processing pipeline requires setting the status instead of moving the file - this may or may not be a problem for bulk changes, depending on how much access you have to the live system.
In reply to Re: Managing a web form submission work queue
by Corion
in thread Managing a web form submission work queue
by Limbic~Region
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |