Reliable Work Queue Manager

saintmike has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reliable Work Queue Manager by ph713 (Pilgrim) on Oct 27, 2005 at 05:30 UTC
I've done similar modules from scratch for projects before. For maximum reliability, simplicity, and portability, I tend to rely on nix atomic rename() calls within a fixed work queue directory to provide the gaurantees I need. The general idea is: You have a queue directory that all the readers and writers know the location of. The "enqueue()" method writes your work to a temporary file in that directory (say ".__w58yto4er.qfile", name generated by File::Temp, but with your .__ prefix and .qfile suffix). Once it is successfully written to disk, you then do an atomic rename() call to remove the ".__" from the name. Queue readers dequeue work by scanning for files that don't start with dots in the work queue directory. If a worker decides to take a job, the first thing he does is again do an atomic rename() and renames it to ".--w58yto4er.qfile" before working on the request, and moves on silently if the rename fails (because another worker beat him to it). With just that basic technique you have a disk-persistent unordered work queue that's atomically self-consistent and allows multiple readers and writers. At startup time after a crash you can rename all the .-- files back to normal names before firing up the workers to restart work on jobs that were interrupted by a crash or whatever. Be sure to use a (preferably data-)journalling filesystem and all that jazz. If you want the queue to be ordered, you can put the queue job serial number encoded in the queuefile name in place of the File::Temp random characters, you'll just need a locked source of incrementing serial numbers. Perhaps a special file inside or outside the queue directory which starts with the contents "1". The enqueue() method could first wait on an advisory flock() on the serial number file, then once the lock is obtained, read the "1", write "2" to a new temporary file, and then atomically rename() the new temporary file over the name of the serial file he just read, and then close() and release the flock on the now-dead "1" file. Now he can use "1" for his job number, and the next guy in line will receive "2". If enqueue()-ers fail at odd points in the process you may have holes in your serial number sequence, but never duplicates. Just food for thought if you're thinking of rolling your own. Edited to Add:* I just re-read that and the flock/rename() serial method doesn't really work, because the flock() would be tied to the inode not the name. The problem with just locking and overwriting the contents of the file is that a crash halfway through might leave a corrupted serial number. But then again in your startup code after a crash I suppose you could re-generate the serial number number to "1" for an empty queue directory, or the next number after the highest job sitting in the queue still, assuming the serial numbers don't have meaning outside of the queue and therefore the sequence doesn't need to generate truly unique numbers for all time.	[reply]
Re^2: Reliable Work Queue Manager by Adze (Acolyte) on Oct 28, 2005 at 14:50 UTC
Some good advice above. I have used IPC::DirQueue with good results in the past. Read the specs of the maildir format at <http://cr.yp.to/proto/maildir.html> - quite a profound lesson to be learnt about the combined power of the filesystem and atomic syscalls when it comes to designing systems which are robust and avoid contention issues (e.g. flock over NFS). I would recommend against reinventing the wheel here.	[reply]
Re: Reliable Work Queue Manager by merlyn (Sage) on Oct 26, 2005 at 21:31 UTC
You could steal some of the ideas I used on my Class::DBI-based link checker. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: Reliable Work Queue Manager by perrin (Chancellor) on Oct 26, 2005 at 21:41 UTC
I wrote a job queue manager for $work, but it's currently too tied into our particular application for me to release it. It's probably overkill for you though. I think you could just use Parallel::ForkManager for this. See the URL example in the docs.	[reply]
Re: Reliable Work Queue Manager by kscaldef (Pilgrim) on Oct 27, 2005 at 05:45 UTC
Spread is used reasonably widely, but generally not by big companies, due to an unfortunate license. From what I hear, it's reasonably good, if you can get over that.	[reply]
Re: Reliable Work Queue Manager by pajout (Curate) on Oct 27, 2005 at 07:57 UTC
I plan to write something like as Oracle's queueing, but not so rich of features. I think that SQL db is better layer than filesystem, because transactions and concurrency.	[reply]
Re^2: Reliable Work Queue Manager by radiantmatrix (Parson) on Oct 27, 2005 at 13:13 UTC
I tend to agree that a database is the way to approach this, but there are two ways I can think of that avoid installing a 3rd-party DB for what might be a very simple queue. The first suggestion would be to implement said queue using DBD::SQLite2 via DBI. This path would allow an easy upgrade to a full-scale RDBMS in the future. Then there's DBM::Any. DBM files could work well for this if you plan carefully, and thanks to DBD::DBM, one could ensure a reasonable path to a full RDBMS in the future. <-radiant.matrix-> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code "In any sufficiently large group of people, most are idiots" - Kaa's Law	[reply]
Re^3: Reliable Work Queue Manager by ph713 (Pilgrim) on Oct 27, 2005 at 18:20 UTC
Having a truly transactionally coherent ACID relational database at your disposal of course makes things like persistent jobs queues a breeze. Then you just need a "jobs" table with an autoincrementing serial number for a key and you're pretty much done. But I'd be careful of thinking that using DBM-based solutions buys you the same gaurantees as a real database, I would imagine there are a lot of corner cases for recovery that don't work out so well in certain failure scenarios. But, IMHO, relying on having a full-blown RDBMS installed anwhere you use your queueing library isn't very good planning either. A lot of good n-tier application architectures can be built assuming you have a disk-persistent network-transparent queueing system in place that you can run on arbitrary machines without the support of an RDBMS or specific dbm implementation, along the lines of IBM MQseries in terms of functionality. You don't want to have to put local RDBMS's on every node involved in the architecture in order for it to work. My opinion is that for a general-purpose queueing module one should build a solution based on queue directory management using whatever atomic and/or locking resources you have available. POSIX specifies atomic rename() and good fcntl() locks. Then you can deploy the module in any reasonable environment which implements the POSIX primitives you depend on (excuse my *nix centricity, I don't take Windows seriously at all, and I'm not really sure whether it supports POSIX in that regard). Come to think of it, sendmail/postfix + (E)SMTP amounts to a disk-persistent network-transparent job queueing system using POSIX primitives and a queue directory on the local machines. It makes a fine example, spam problems notwithstanding :)	[reply]
Re: Reliable Work Queue Manager by Anonymous Monk on Oct 27, 2005 at 20:07 UTC
For something a bit interesting. Amazon Simple Queue Service (Beta) Amazon's Simple Queue Service (XML.com)	[reply]