Elwood1 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use Parallel::Iterator to fetch a bunch of webpages using basic authentication, then write the contents back to my local disk. I can' seem to pass it the correct data. Can anyone help?
#!/usr/bin/perl use LWP::UserAgent; use Parallel::Iterator qw/iterate_as_array/; # a list of pages to fetch my @urls =( [name1,user1,password1,url1,enable1], [name2,user2,password2,url2,enable2], [name3,user3,password3,url3,enable3], ); my $ua = LWP::UserAgent->new(); # this worker fetches a page and returns the HTTP status code my $worker = sub { my $index = shift; my $name = shift; my $username = shift; my $pass = shift; my $url = shift; $ua->credentials($name,$url,$username,$pass); my $response = $ua->get($url); my $content = $response->decoded_content(); return ( $index, $response->code(), $content ); }; my %options = (); $options{workers} = 5; # Fetch pages in parallel my @status_codes = iterate_as_array(\%options, $worker, \@urls ); # Display results my %codes = (); @codes{@urls} = @status_codes; # output results my $format = "%-40s %s\n"; printf( "$format", 'URL', 'Status' ); foreach my $url ( sort keys %codes ) { open (FILEOUT, '>', $name); print FILEOUT $content; close (FILEOUT); }

Replies are listed 'Best First'.
Re: Parallel::Iterator to get multiple pages
by kcott (Archbishop) on Aug 25, 2013 at 04:21 UTC

    G'day Elwood1,

    Welcome to the monastery.

    You really should let Perl tell you what you've done wrong (or even questionably): it's a lot quicker and far less effort than posting on a forum and waiting for someone to reply. Here's some of the things you could have done with an indication of the feedback you would have got:

    • The warnings pragma would tell you about two problematic variables in your foreach loop.
    • The strict pragma would tell you about fifteen strict subs and two strict vars issues.
    • The autodie pragma would tell you about trying to write to a zero-length filename. You could have also done this with a customised message as shown in the open documentation.

    So, Perl would have reported all those issues if you'd added the following short code to the start of your script.

    use strict; use warnings; use autodie;

    Note that this is far less typing than what you wrote in this post and the feedback is immediate!

    There's two other potential problems that I can see (that doesn't mean there isn't more). Fixing the issues already highlighted may resolve these depending on how you rewrite your code; however, as the code currently stands:

    • open (FILEOUT, '>', $name); overwrites the file (given by $name) on each iteration of the foreach loop. Perhaps you want append mode (i.e. '>>' instead of '>') or you want to vary $name such that you're dealing with a different filename each time you go through the loop.
    • With @codes{@urls} = @status_codes;, you're assigning to keys with names like "ARRAY(0xffffffffffff)" (where ffffffffffff is some hexidecimal number). I'm not entirely sure what you want here but "@codes{ map { $_->[3] } @urls } = @status_codes;" may be closer to the mark.

    I concur with ww's comments. A better question will get you better answers: guidelines for doing so are provided by "How do I post a question effectively?".

    -- Ken

      Thanks for your feedback and tips. The problem seems to be that I'm not able to pass the @url data fully to the worker and then get the contents back. Yes, I've been all over the Parallel::Iterator documentation, but there are only a few examples.
        I've been all over the Parallel::Iterator documentation

        So someone must have silently removed this part from your copy of the documentation:

        How It Works

        The current process is forked once for each worker. Each forked child is connected to the parent by a pair of pipes. The child's STDIN, STDOUT and STDERR are unaffected.

        Input values are serialised (using Storable) and passed to the workers. Completed work items are serialised and returned.

        Caveats

        Parallel::Iterator is designed to be simple to use - but the underlying forking of the main process can cause mystifying problems unless you have an understanding of what is going on behind the scenes.

        Worker execution enviroment

        All code apart from the worker subroutine executes in the parent process as normal. The worker executes in a forked instance of the parent process. That means that things like this won't work as expected:

        my %tally = (); my @r = iterate_as_array( sub { my ($id, $name) = @_; $tally{$name}++; # might not do what you think it does return reverse $name; }, @names ); # Now print out the tally... while ( my ( $name, $count ) = each %tally ) { printf("%5d : %s\n", $count, $name); }

        Because the worker is a closure it can see the %tally hash from its enclosing scope; but because it's running in a forked clone of the parent process it modifies its own copy of %tally rather than the copy for the parent process.

        That means that after the job terminates the %tally in the parent process will be empty.

        In general you should avoid side effects in your worker subroutines.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Parallel::Iterator to get multiple pages
by ww (Archbishop) on Aug 25, 2013 at 01:38 UTC
    It's a lot easier (and more likely to be useful) when we try to help with a specific problem, error message, or sample of unexpected output. "I can' (sic) seem to pass it the correct data" doesn't really provide enough info to reply cogently. So perhaps you'll come back to add information about what data you've passed and on what basis you inferred that that data was not "correct."

    And, not just BTW, have you exhausted whatever help exists in the doc for Parallel::Iterator/?

    If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.
      I'm not able to pass the @url data fully to the worker and then get the contents back from the worker.
Re: Parallel::Iterator to get multiple pages
by poj (Abbot) on Aug 25, 2013 at 10:59 UTC
    try
    my $worker = sub { my ($index,$ar) = @_; my ($name,$username,$pass,$url,$enable) = @$ar; .. }
    poj

      Hi poj,

      I'm pretty sure your hint is the solution. May I have an annotation: If this hurdle is taken Elwood1 would get problems with his return values.

      Wrap return code and return content into an anonymous array.

      return ( $index, [ $response->code(), $content ]);

      As far as I can see from the man page only one return value is allowed besides the index.

      McA