in reply to Parallel::Iterator to get multiple pages

G'day Elwood1,

Welcome to the monastery.

You really should let Perl tell you what you've done wrong (or even questionably): it's a lot quicker and far less effort than posting on a forum and waiting for someone to reply. Here's some of the things you could have done with an indication of the feedback you would have got:

So, Perl would have reported all those issues if you'd added the following short code to the start of your script.

use strict; use warnings; use autodie;

Note that this is far less typing than what you wrote in this post and the feedback is immediate!

There's two other potential problems that I can see (that doesn't mean there isn't more). Fixing the issues already highlighted may resolve these depending on how you rewrite your code; however, as the code currently stands:

I concur with ww's comments. A better question will get you better answers: guidelines for doing so are provided by "How do I post a question effectively?".

-- Ken

Replies are listed 'Best First'.
Re^2: Parallel::Iterator to get multiple pages
by Elwood1 (Initiate) on Aug 25, 2013 at 05:32 UTC
    Thanks for your feedback and tips. The problem seems to be that I'm not able to pass the @url data fully to the worker and then get the contents back. Yes, I've been all over the Parallel::Iterator documentation, but there are only a few examples.
      I've been all over the Parallel::Iterator documentation

      So someone must have silently removed this part from your copy of the documentation:

      How It Works

      The current process is forked once for each worker. Each forked child is connected to the parent by a pair of pipes. The child's STDIN, STDOUT and STDERR are unaffected.

      Input values are serialised (using Storable) and passed to the workers. Completed work items are serialised and returned.

      Caveats

      Parallel::Iterator is designed to be simple to use - but the underlying forking of the main process can cause mystifying problems unless you have an understanding of what is going on behind the scenes.

      Worker execution enviroment

      All code apart from the worker subroutine executes in the parent process as normal. The worker executes in a forked instance of the parent process. That means that things like this won't work as expected:

      my %tally = (); my @r = iterate_as_array( sub { my ($id, $name) = @_; $tally{$name}++; # might not do what you think it does return reverse $name; }, @names ); # Now print out the tally... while ( my ( $name, $count ) = each %tally ) { printf("%5d : %s\n", $count, $name); }

      Because the worker is a closure it can see the %tally hash from its enclosing scope; but because it's running in a forked clone of the parent process it modifies its own copy of %tally rather than the copy for the parent process.

      That means that after the job terminates the %tally in the parent process will be empty.

      In general you should avoid side effects in your worker subroutines.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        clear as mud. That does not explain to me why the expected variables are not passed to the worker.