Re: Adding parallel processing to a working Perl script

Personally - I would be VERY wary of trying to retrofit threading. There are many big scary bugs that are lurking within to bite you. I would strongly recommend that you assume you'll need a rewrite from scratch, and then borrow from your original source.

It at least looks like you're dealing with an implicitly parallel problem, so I would suggest:

Redraft your code such that you have a 'worker' subroutine, which handles one thing at a time. (Multiple are OK, if you've different cases to handle and you want to parallelise)
Use Thread::Queue, and 'feed' your worker with a queue. (unthreaded).
Consider your 'worker' sub as a thread, and spawn multiple

Here's a really basic template for what I mean: A basic 'worker' threading example.

(Not to denigrate the perfectly sound advice other Monks have offered. This is purely my opinion as to how I would approach your problem)

Comment on Re: Adding parallel processing to a working Perl script

Replies are listed 'Best First'.
Re^2: Adding parallel processing to a working Perl script by Jim (Curate) on Apr 26, 2014 at 06:15 UTC
This is a terrific response. Thank you very much, Preceptor. Your post titled A basic 'worker' threading example is exactly the kind of beginning Perl threads tutorial I was looking for. I'll study it this weekend and then try to apply its lessons to my application. Redraft your code such that you have a 'worker' subroutine, which handles one thing at a time. Here's my refactored code. My intention was to make it readily adaptable to threading. The intended 'worker' subroutine is `probe_volume()`. I've probably missed the mark entirely, but with guidance from you and other kind monks, I'm hoping I can finally write my first truly useful parallel program. Read more... (4 kB)	[reply] [d/l] [select]
Re^3: Adding parallel processing to a working Perl script by Preceptor (Deacon) on Apr 28, 2014 at 10:37 UTC
I think you may still be trying to pass a bit too much back and forth. Thread::Queue is a lovely way of handling queuing, but it works best with single values. You're passing a hash into probe_volume - which works single threaded, but can get quite complicated if multithreading. I think you need to step back a little and consider the design - threading increases throughput by parallelism, but as a result means that each of your threads occur asynchronously and non deterministically - you will never know which order your threads will complete tasks in. You therefore can't do something like 'print probe_volume' - you'll have to collate your data and (potentially) reorder it first. You will also need to think about sharing variables - you pass a hash into probe_volume, and return a list. This will probably cause you pain. Sharing variables between threads is potentially quite complicated and a source of some really annoying bugs. Try to avoid doing it. I would therefore suggest that what you want is a 'standalone' probe_volume subroutine that takes _just_ a volume name (either passed via sub call, but ideally 'fed' through a Thread::Queue). And outputs (again, returning via sub call, or Thread::Queue) the results, but without using anything from the global namespace. (Read only access to e.g. command definitions would be ok)	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks