joe425 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am trying to set up a simple job queue that will allow tasks to be worked on in parallel from different processes. I am quite new to Perl so I may have missed something obvious that I should be doing.

I have used IPC::DirQueue, and this works well when I have one 'worker' task, from the cpan page http://search.cpan.org/~jmason/IPC-DirQueue-1.0/lib/IPC/DirQueue.pm it looks like exactly the solution I need.

When I try to run two or more worker tasks, occasionally more than one worker will pick up the same job.

This is a problem for efficiency, each task is quite long, as well as causing errors. As once two workers get the same job one of the workers will then have an error when trying to assess the jobs data file or when trying to set the job to finished, because the other worker has finished the job first, so the files have been deleted.

Error: IPC::DirQueue: unlink failed: DirQueueTest/active/50.20140528105122431255.DNzE2 at C:/Perl 64/site/lib/IPC/DirQueue.pm line 831.

The following short code causes the problem for me: worker.pl (is run twice, or more)

use strict; use warnings; use IPC::DirQueue; use IPC::DirQueue::Job; my $dq = IPC::DirQueue->new({ dir => 'DirQueueTest' }); while(1) { my $job = $dq->wait_for_queued_job(0); if (!$job) { print "no jobs left\n"; } else { my $TestNumber = $job->{metadata}->{TestNumber}; my $QueuePid = $job->{metadata}->{QueuePid}; print "Worker:$$ Running QueuePid=$QueuePid, TestNum=$TestNumber\n +"; $job->finish(); } }
queue.pl (is run once)
use strict; use warnings; use IPC::DirQueue; my $DirQueueOpts = { dir => 'DirQueueTest', active_file_lifetime => (60*60*24) }; my $TestNum = 0; while($TestNum < 100000) { print "QueuePid=$$, TestNum=$TestNum\n"; my $dq = IPC::DirQueue->new($DirQueueOpts); my $QNXBuildMetaData = { TestNumber => $TestNum, QueuePid => $$}; $dq->enqueue_string("QueuePid=$$, TestNum=$TestNum", $QNXBuildMeta +Data); $TestNum = $TestNum+1; }

I have also looked at using Directory::Queue::Simple but this seems to have a similar problem. Sometimes the $dirq->lock($name); function works and returns false when the job has been locked by the other worker. But sometimes it returns true, and both workers start the same job.

I am using ActiveState Perl v5.16.3 build 1604

Is this a known problem with locking files on windows platforms? Or should I be doing some error checking that i have not understood?

Any suggestions for alternative approaches to this problem are also welcome, but I was hoping to get away without having to run a server based message queue or database.

Replies are listed 'Best First'.
Re: Problems using file based queues on windows
by GrandFather (Saint) on May 29, 2014 at 11:22 UTC

    About four years ago I was looking around for ways to solve the same problem in the context of a build system. In the end I settled on a hand rolled database based solution because I couldn't anything that fit well in CPAN. Over the four years since the system was turned on its processed about 800,000 tasks and now runs around a dozen "task clients" on Windows XP, Vista and Windows 7 (32 and 64 bit), various versions of Debian and a couple of Mac OS-X versions. The database based design has been very successful!

    The original design relied on a server that handled email used to manage tasks, but there is now a web UI for the system so the only thing the server really does is provide the database. I find having task history recorded in a database is a great tool for diagnosing issues with the system and tracking the evolution of tasks and the system over time.

    Perl is the programming world's equivalent of English
Re: Problems using file based queues on windows (race)
by tye (Sage) on May 29, 2014 at 15:34 UTC

    There are reliable and portable ways to prevent such race conditions easily from Perl. Unfortunately, IPC::DirQueue seems disappointingly naive on this point:

    # now, we want to try to avoid 2 or 3 dequeuers removing # the lockfile simultaneously, as that could cause this race: # # dqproc1: [checks file] [unlinks] [starts work] # dqproc2: [checks file] [unlinks] # # ie. the second process unlinks the first process' brand-new # lockfile! # # to avoid this, use a random "fudge" on the timeout, so # that dqproc2 will wait for possibly much longer than # dqproc1 before it decides to unlink it. # # this isn't perfect. TODO: is there a "rename this fd" syscall # accessible from perl?

    - tye