Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi monks, i seek your wisdom,i am pretty new to perl

im using strawberryperl on windows xp to download multiple html pages,i want each in a variable

right now im doing this but as i see it, it gets one page at a time, and doesent go to the next until the current is downloaded

my $page = `curl -s http://mysite.com/page -m 2`; my $page2 = `curl -s http://myothersite.com/page -m 2`;

there are about 4 links in total, so i wanted to keep it as simple as possible,

looked into parallel::forkmanager, but couldnt get it to work also tried to use the windows command start before curl but that doesent get the page is there a more simple way to do this?

thank you in advance

Replies are listed 'Best First'.
Re: simple multithreading with curl
by BrowserUk (Patriarch) on May 19, 2013 at 18:31 UTC

    Not quite a one-liner:

    #! perl -slw use strict; use threads; use LWP::Simple; my @pages = map $_->join, map async( sub{ get "http://$_[0]"; }, $_ ), qw[ www.bbc.co.uk www.ibm.com www.cnn.com www.microsoft.com ];; print substr $_, 0, 100 for @pages; __END__ C:\test>1034235.pl <!DOCTYPE html> <html lang="en-GB" > <head> <!-- Barlesque 2.45.9 --> +<meta http-equiv="Content-Type <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w +3.org/TR/xhtml1/DTD/xhtml1-str <!DOCTYPE HTML> <html lang="en-US"> <head> <title>CNN.com International - Breaking, World, Business <!DOCTYPE html> <html class="en-gb no-js" lang="en" dir="ltr" xmlns:bi="urn:schemas-mi +crosoft-com:m

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      i forgot to mention something:
      need to manage each downloaded page differently, so i need to know each page in wich variable it is
      i thought this would be easier, for a noob like me
      when reading the manuals & tuts it seems so easy :)
        so i need to know each page in wich variable it is

        The pages will be in the array in the same order as the urls are in the list.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: simple multithreading with curl
by kennethk (Abbot) on May 19, 2013 at 18:07 UTC

    What did your Parallel::ForkManager code look like? Why didn't it work? We can't debug what we never see.

    Why are you using a command-line call to curl when LWP::UserAgent does all this in Perl, and gives you more convenient error handling?

    Why does it matter that you are downloading in series rather than parallel? Given how much more complicated any parallel/threaded code is, the time penalty for just waiting for the download is probably lower than the cost of your time in trying to code it and get it working.

    All that having been said, threads is core and will probably get your job done with the minimum of fuss. Demo code:

    use strict; use warnings; use LWP::UserAgent; use threads; my @websites = ( 'http://mysite.com/page', 'http://myothersite.com/page', 'http://myotherothersite.com/page', 'http://myotherotherothersite.com/page', ); my @threads; for my $url (@websites) { push @threads, threads->create(\&fetch, $url); } my @pages; for my $thread (@threads) { push @pages, $thread->join; } sub fetch { my $url = shift; my $ua = LWP::UserAgent->new; my $result = $ua->get($url); return $result->is_success ? $result->decoded_content : "Page retr +ieval failed"; }
    or, if you are comfortable w/ map,
    my @pages = map $_->join, map threads->create(\&fetch, $_), @websites;

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: simple multithreading with curl
by choroba (Cardinal) on May 19, 2013 at 18:12 UTC
    Crossposted at StackOverflow. It is considered courtesy to inform about crossposting so people not attending both sites do not waste their efforts in solving a problem already sovled at the other corner of the Internets.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      sorry about that, didnt know i should mention it