in reply to Re: Parallel::Iterator to get multiple pages
in thread Parallel::Iterator to get multiple pages

Thanks for your feedback and tips. The problem seems to be that I'm not able to pass the @url data fully to the worker and then get the contents back. Yes, I've been all over the Parallel::Iterator documentation, but there are only a few examples.
  • Comment on Re^2: Parallel::Iterator to get multiple pages

Replies are listed 'Best First'.
Re^3: Parallel::Iterator to get multiple pages
by afoken (Chancellor) on Aug 25, 2013 at 10:29 UTC
    I've been all over the Parallel::Iterator documentation

    So someone must have silently removed this part from your copy of the documentation:

    How It Works

    The current process is forked once for each worker. Each forked child is connected to the parent by a pair of pipes. The child's STDIN, STDOUT and STDERR are unaffected.

    Input values are serialised (using Storable) and passed to the workers. Completed work items are serialised and returned.

    Caveats

    Parallel::Iterator is designed to be simple to use - but the underlying forking of the main process can cause mystifying problems unless you have an understanding of what is going on behind the scenes.

    Worker execution enviroment

    All code apart from the worker subroutine executes in the parent process as normal. The worker executes in a forked instance of the parent process. That means that things like this won't work as expected:

    my %tally = (); my @r = iterate_as_array( sub { my ($id, $name) = @_; $tally{$name}++; # might not do what you think it does return reverse $name; }, @names ); # Now print out the tally... while ( my ( $name, $count ) = each %tally ) { printf("%5d : %s\n", $count, $name); }

    Because the worker is a closure it can see the %tally hash from its enclosing scope; but because it's running in a forked clone of the parent process it modifies its own copy of %tally rather than the copy for the parent process.

    That means that after the job terminates the %tally in the parent process will be empty.

    In general you should avoid side effects in your worker subroutines.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      clear as mud. That does not explain to me why the expected variables are not passed to the worker.