Re^4: Sharing large data structures between threads

I must use long timeouts (at least 5 minutes), and I have to retry the operations with not too much of a delay.

In this case I would probably use a pool of n-cores LWP::Parallel::UserAgents feeding off a common queue. In this way, you can potentially make full use of both your cores and your bandwidth, whatever combination of instantaneous loads your program finds itself having to deal with.

I fully appreciate your preference for the simple linear flows of blocking IO architectures, but LWP::Parallel::UserAgent's 3 callbacks make for a reasonably sane and manageable alternative.

A couple of speculations from reading between the lines of your posts:

It sounds like you're mixing DB fetches and LWP fetches in the same thread routines?
If so, I'd suggest that you seriously consider separating those two pieces of functionality.
It sounds like you have your workers scanning (polling) your large, shared, data-structure looking for work?
This is almost always a bad idea. (Indeed, large shared data structures are usually a bad idea full stop, but I digress.).
In order for there to be a work item in your large shared data structure (LSDS), something has to write it there. And then your workers have to trundle around that LSDS re-reading the same data over and over, trying to recognise what is new, and what of what is new, is an as yet unactioned work item.
And of course, as you have many workers, you also have to deal with the possibility that two or more workers may happen across the same new work item concurrently, so you need a mechanism, probably locking, to prevent them from trying to action the same work item multiple times. And even if you've adopted some non-locking mechanism for duplicate prevention, you still have to use locking for whatever is writing this new data to the structure, and that just creates a bottleneck that can seriously hamper throughput while all the readers wait while the writer(s) write.
It is almost always better to have the code that writes the information to that structure recognise when it is about to write a new work item to the LSDS, and have it queue that work item to a shared queue read by all the worker threads. This way, the workers don't have to waste time searching for something to do--including the high probability that they 'just miss' a new work item on one pass and have to start over again looking for something, only to have another worker find that work item and so they waste another pass.
This scenario often leads to some workers doing nothing but poll the LSDS but never actually finding anything useful to do. And the more workers there are, the higher the probability of that happening.
With the work item queue, when a thread is ready to do something, it just reads the next item from the queue. If there is nothing there, it just blocks, consuming no resources until something becomes available and ensuring that it doesn't hamper--by locking the LSDS; or uselessly chewing valuable cpu cycles--the other workers that do have useful work to do.
And of course, once you have the writer(s) perform the work item recognition and queuing, then the need for the LSDS becomes redundant, and so you free up that memory, and avoid the overheads of locking.

Your application sounds like an interesting problem to solve, and given sufficient information to allow me to avoid having to speculate about it, I would enjoy considering the possibilities available to you.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^4: Sharing large data structures between threads