Re: Parallel::Iterator to get multiple pages

Welcome to the monastery.

You really should let Perl tell you what you've done wrong (or even questionably): it's a lot quicker and far less effort than posting on a forum and waiting for someone to reply. Here's some of the things you could have done with an indication of the feedback you would have got:

The warnings pragma would tell you about two problematic variables in your foreach loop.
The strict pragma would tell you about fifteen strict subs and two strict vars issues.
The autodie pragma would tell you about trying to write to a zero-length filename. You could have also done this with a customised message as shown in the open documentation.

So, Perl would have reported all those issues if you'd added the following short code to the start of your script.

use strict;
use warnings;
use autodie;
[download]

Note that this is far less typing than what you wrote in this post and the feedback is immediate!

There's two other potential problems that I can see (that doesn't mean there isn't more). Fixing the issues already highlighted may resolve these depending on how you rewrite your code; however, as the code currently stands:

open (FILEOUT, '>', $name); overwrites the file (given by $name) on each iteration of the foreach loop. Perhaps you want append mode (i.e. '>>' instead of '>') or you want to vary $name such that you're dealing with a different filename each time you go through the loop.
With @codes{@urls} = @status_codes;, you're assigning to keys with names like "ARRAY(0xffffffffffff)" (where ffffffffffff is some hexidecimal number). I'm not entirely sure what you want here but "@codes{ map { $_->[3] } @urls } = @status_codes;" may be closer to the mark.

I concur with ww's comments. A better question will get you better answers: guidelines for doing so are provided by "How do I post a question effectively?".

-- Ken

Comment on Re: Parallel::Iterator to get multiple pages Select or Download Code

Replies are listed 'Best First'.
Re^2: Parallel::Iterator to get multiple pages by Elwood1 (Initiate) on Aug 25, 2013 at 05:32 UTC
Thanks for your feedback and tips. The problem seems to be that I'm not able to pass the @url data fully to the worker and then get the contents back. Yes, I've been all over the Parallel::Iterator documentation, but there are only a few examples.	[reply]
Re^3: Parallel::Iterator to get multiple pages by afoken (Chancellor) on Aug 25, 2013 at 10:29 UTC
I've been all over the Parallel::Iterator documentation So someone must have silently removed this part from your copy of the documentation: How It Works The current process is forked once for each worker. Each forked child is connected to the parent by a pair of pipes. The child's STDIN, STDOUT and STDERR are unaffected. Input values are serialised (using Storable) and passed to the workers. Completed work items are serialised and returned. Caveats Parallel::Iterator is designed to be simple to use - but the underlying forking of the main process can cause mystifying problems unless you have an understanding of what is going on behind the scenes. Worker execution enviroment All code apart from the worker subroutine executes in the parent process as normal. The worker executes in a forked instance of the parent process. That means that things like this won't work as expected: `my %tally = (); my @r = iterate_as_array( sub { my ($id, $name) = @_; $tally{$name}++; # might not do what you think it does return reverse $name; }, @names ); # Now print out the tally... while ( my ( $name, $count ) = each %tally ) { printf("%5d : %s\n", $count, $name); }` [download] Because the worker is a closure it can see the %tally hash from its enclosing scope; but because it's running in a forked clone of the parent process it modifies its own copy of %tally rather than the copy for the parent process. That means that after the job terminates the %tally in the parent process will be empty. In general you should avoid side effects in your worker subroutines. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]
Re^4: Parallel::Iterator to get multiple pages by Elwood1 (Initiate) on Aug 25, 2013 at 10:32 UTC
clear as mud. That does not explain to me why the expected variables are not passed to the worker.	[reply]