in reply to Re^7: adding a hash to a shared object
in thread adding a hash to a shared object

Thank you. I can see how this allows for the correct scheduling of the tasks, but what I'm still missing is how do I deal with data that each subroutine generates and I want to store somewhere outside, not just pass to the next function.

For example, some subroutines create small hashes which I want to add up at the end of the program and write as a JSON file.

Also, I should note my use of your scheme will be probably very degenerate since any subroutine is only called once (i.e. a single payload...).

Replies are listed 'Best First'.
Re^9: adding a hash to a shared object
by Corion (Patriarch) on Aug 11, 2010 at 13:18 UTC

    If your subroutines are only ever called once, what gains do you hope to get from parallelizing them?

    My approach would be to not store any data but to forward it to whatever subroutine. If you want to do some processing after a subroutine has consumed all data passed to it, do it just there:

    sub accumulate { async { my %totals; while (defined (my $payload = $q2->dequeue())) { $totals{ $payload }++; }; # Processing has finished print_to_json(\%totals); }; };
      Each subroutine is called once, but I want to parallelize the calls for the different subroutines. Suppose sub1 takes a minute to run and so does sub2, and the do not depend on each other, so I can run them concurrently.

        I'm confused. First you say:

        My perl script is used to run some kind of a pipeline. I start by reading a JSON file with a bunch of parameters in it. I then do some work - mainly building some data structures needed later and calling external programs that generate some output files I keep references to.

        ... but now you say the subroutines don't use any common data. I think I need more explanation, as to which is what your actual situation is.

        If your subroutines don't need any common data, consider making them into separate programs, or simply running them in separate threads (if you really want to use threads, which I would want to avoid, then). If your subroutines depend on data prepared by each other, make them communicate through queues. If your subroutines depend on data read by the main program, feed each subroutine its data through a queue.