Parallel::Iterator to get multiple pages

Elwood1 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parallel::Iterator to get multiple pages by kcott (Archbishop) on Aug 25, 2013 at 04:21 UTC
G'day Elwood1, Welcome to the monastery. You really should let Perl tell you what you've done wrong (or even questionably): it's a lot quicker and far less effort than posting on a forum and waiting for someone to reply. Here's some of the things you could have done with an indication of the feedback you would have got: The warnings pragma would tell you about two problematic variables in your `foreach` loop. The strict pragma would tell you about fifteen `strict subs` and two `strict vars` issues. The autodie pragma would tell you about trying to write to a zero-length filename. You could have also done this with a customised message as shown in the open documentation. So, Perl would have reported all those issues if you'd added the following short code to the start of your script. `use strict; use warnings; use autodie;` [download] Note that this is far less typing than what you wrote in this post and the feedback is immediate! There's two other potential problems that I can see (that doesn't mean there isn't more). Fixing the issues already highlighted may resolve these depending on how you rewrite your code; however, as the code currently stands: `open (FILEOUT, '>', $name);` overwrites the file (given by `$name`) on each iteration of the `foreach` loop. Perhaps you want append mode (i.e. `'>>'` instead of `'>'`) or you want to vary `$name` such that you're dealing with a different filename each time you go through the loop. With `@codes{@urls} = @status_codes;`, you're assigning to keys with names like "`ARRAY(0xffffffffffff)`" (where `ffffffffffff` is some hexidecimal number). I'm not entirely sure what you want here but "`@codes{ map { $_->[3] } @urls } = @status_codes;`" may be closer to the mark. I concur with ww's comments. A better question will get you better answers: guidelines for doing so are provided by "How do I post a question effectively?". -- Ken	[reply] [d/l] [select]
Re^2: Parallel::Iterator to get multiple pages by Elwood1 (Initiate) on Aug 25, 2013 at 05:32 UTC
Thanks for your feedback and tips. The problem seems to be that I'm not able to pass the @url data fully to the worker and then get the contents back. Yes, I've been all over the Parallel::Iterator documentation, but there are only a few examples.	[reply]
Re^3: Parallel::Iterator to get multiple pages by afoken (Chancellor) on Aug 25, 2013 at 10:29 UTC
I've been all over the Parallel::Iterator documentation So someone must have silently removed this part from your copy of the documentation: How It Works The current process is forked once for each worker. Each forked child is connected to the parent by a pair of pipes. The child's STDIN, STDOUT and STDERR are unaffected. Input values are serialised (using Storable) and passed to the workers. Completed work items are serialised and returned. Caveats Parallel::Iterator is designed to be simple to use - but the underlying forking of the main process can cause mystifying problems unless you have an understanding of what is going on behind the scenes. Worker execution enviroment All code apart from the worker subroutine executes in the parent process as normal. The worker executes in a forked instance of the parent process. That means that things like this won't work as expected: `my %tally = (); my @r = iterate_as_array( sub { my ($id, $name) = @_; $tally{$name}++; # might not do what you think it does return reverse $name; }, @names ); # Now print out the tally... while ( my ( $name, $count ) = each %tally ) { printf("%5d : %s\n", $count, $name); }` [download] Because the worker is a closure it can see the %tally hash from its enclosing scope; but because it's running in a forked clone of the parent process it modifies its own copy of %tally rather than the copy for the parent process. That means that after the job terminates the %tally in the parent process will be empty. In general you should avoid side effects in your worker subroutines. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]
Re^4: Parallel::Iterator to get multiple pages by Elwood1 (Initiate) on Aug 25, 2013 at 10:32 UTC
Re: Parallel::Iterator to get multiple pages by ww (Archbishop) on Aug 25, 2013 at 01:38 UTC
It's a lot easier (and more likely to be useful) when we try to help with a specific problem, error message, or sample of unexpected output. "I can' (sic) seem to pass it the correct data" doesn't really provide enough info to reply cogently. So perhaps you'll come back to add information about what data you've passed and on what basis you inferred that that data was not "correct." And, not just BTW, have you exhausted whatever help exists in the doc for Parallel::Iterator/? If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.	[reply]
Re^2: Parallel::Iterator to get multiple pages by Elwood1 (Initiate) on Aug 25, 2013 at 05:37 UTC
I'm not able to pass the @url data fully to the worker and then get the contents back from the worker.	[reply]
Re: Parallel::Iterator to get multiple pages by poj (Abbot) on Aug 25, 2013 at 10:59 UTC
try `my $worker = sub { my ($index,$ar) = @_; my ($name,$username,$pass,$url,$enable) = @$ar; .. }` [download] poj	[reply] [d/l]
Re^2: Parallel::Iterator to get multiple pages by McA (Priest) on Aug 25, 2013 at 12:11 UTC
Hi poj, I'm pretty sure your hint is the solution. May I have an annotation: If this hurdle is taken Elwood1 would get problems with his return values. Wrap return code and return content into an anonymous array. `return ( $index, [ $response->code(), $content ]);` [download] As far as I can see from the man page only one return value is allowed besides the index. McA	[reply] [d/l]