queues between threads

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

The Thread::Queue module makes it easy to program a producer/consumer model multi-threaded application. One thread puts things in the queue, the other thread takes things out or blocks if empty.

What's missing from this pipeline model is that the producer doesn't block if the queue is full. There is no concept of queue capacity in this class.

Why does it matter? Well, consider the reason I'm considering the producer/consumer model in the first place. I have a number of steps to perform, and the data can be huge. Instead of doing f(g(h(data))) which will require all the data input and output (of one chained call at a time) to be in a Perl scalar, I'll program h(data)|g|f. The h step reads the data a chunk at a time and hands the result to g, and so on, and I don't have huge amounts of data in the system, but each step can choose it's correct boundaries of data blocking.

So, what happens if the producer thread is faster than the consumer thread? The queue grows without bounds, and first starts to swap to virtual memory, when simply letting the consumer have more time would do work that needed doing anyway without needing to swap; eventually it could run out of memory totally.

Since the new threading stuff is in its infancy, I suppose this could be an oversight, and the lack of other variations is just because nobody's written them yet.

So, is there a way to handle this with supplied stuff, or will I need to wrap the queue or implement my own?

I wish they had data pipes, too. Easy enough to tie a handle around a queue, but it suffers from the same problem. My tie implementation would need to add its own semaphore for that, so why bother with a queue object? Just use an array and a semaphore or other locking primitive.

—John

Comment on queues between threads