Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Adding parallel processing to a working Perl script

by Preceptor (Deacon)
on Apr 22, 2014 at 10:16 UTC ( [id://1083123]=note: print w/replies, xml ) Need Help??


in reply to Adding parallel processing to a working Perl script

Personally - I would be VERY wary of trying to retrofit threading. There are many big scary bugs that are lurking within to bite you. I would strongly recommend that you assume you'll need a rewrite from scratch, and then borrow from your original source.

It at least looks like you're dealing with an implicitly parallel problem, so I would suggest:

  • Redraft your code such that you have a 'worker' subroutine, which handles one thing at a time. (Multiple are OK, if you've different cases to handle and you want to parallelise)
  • Use Thread::Queue, and 'feed' your worker with a queue. (unthreaded).
  • Consider your 'worker' sub as a thread, and spawn multiple

Here's a really basic template for what I mean: A basic 'worker' threading example.

(Not to denigrate the perfectly sound advice other Monks have offered. This is purely my opinion as to how I would approach your problem)

  • Comment on Re: Adding parallel processing to a working Perl script

Replies are listed 'Best First'.
Re^2: Adding parallel processing to a working Perl script
by Jim (Curate) on Apr 26, 2014 at 06:15 UTC

    This is a terrific response. Thank you very much, Preceptor. Your post titled A basic 'worker' threading example is exactly the kind of beginning Perl threads tutorial I was looking for. I'll study it this weekend and then try to apply its lessons to my application.

    Redraft your code such that you have a 'worker' subroutine, which handles one thing at a time.

    Here's my refactored code. My intention was to make it readily adaptable to threading. The intended 'worker' subroutine is probe_volume(). I've probably missed the mark entirely, but with guidance from you and other kind monks, I'm hoping I can finally write my first truly useful parallel program.

      I think you may still be trying to pass a bit too much back and forth. Thread::Queue is a lovely way of handling queuing, but it works best with single values. You're passing a hash into probe_volume - which works single threaded, but can get quite complicated if multithreading.

      I think you need to step back a little and consider the design - threading increases throughput by parallelism, but as a result means that each of your threads occur asynchronously and non deterministically - you will never know which order your threads will complete tasks in. You therefore can't do something like 'print probe_volume' - you'll have to collate your data and (potentially) reorder it first.

      You will also need to think about sharing variables - you pass a hash into probe_volume, and return a list. This will probably cause you pain. Sharing variables between threads is potentially quite complicated and a source of some really annoying bugs. Try to avoid doing it.

      I would therefore suggest that what you want is a 'standalone' probe_volume subroutine that takes _just_ a volume name (either passed via sub call, but ideally 'fed' through a Thread::Queue). And outputs (again, returning via sub call, or Thread::Queue) the results, but without using anything from the global namespace. (Read only access to e.g. command definitions would be ok)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1083123]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 13:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found