I couldn't resist posting this here -- even though it's at best a code snippet, and I'm less than a duffer, ranker than an armchair -- just cuz the context is so appropriate, in a self-referential sort of way. (Notwithstanding the inadequacies, at least it's short.)

It's a little script to download the cumulative "Cool Uses for Perl" pages and string them into one big page on my HD. Yeah I know. I should've filtered and scraped and WWW-Mechanize'd away all the duplicated headers and footers and sidebars and stuff. A rainy day perhaps ...

#! /usr/bin/perl -w use strict; use warnings; use LWP::Simple; my $url; my $doc; my $path = "http://www.perlmonks.org/?"; my $fn = "coolness.html"; open TO_FILE, ">> $fn" or die ": $!"; for (my $i=0;$i<300;$i+=15) { $url = $path . "next=" . $i . ";node_id=1044"; warn $url, "\n"; $doc = get($url); print TO_FILE $doc; } close TO_FILE;

(But I'm curious, maybe someone knows: could the for( ) loop be replaced by a while(< >) contruct? Looks a bit verbose, that $doc variable shunting inbound pages to the filehandle. Um?)

(Be kind. You were all once newbies too.)

Unconsidered by castaway - already been considered a few times, we're leaving it unapproved.

Replies are listed 'Best First'.
Re: Cool Uses, re-used
by davido (Cardinal) on Nov 14, 2004 at 17:09 UTC

    LWP::Simple doesn't populate the diamond operator, so you can't use while(<>){..., to answer that question. Maybe you could tie a class to a filehandle that does the LWP::Simple get() behind the scenes, but that just seems a little silly.

    It might please the gods if your script were a little more lazy. By that I mean, perhaps there should be a sleep 15 inside the loop, so that you're not banging away at the server quite so furiously. ...just a thought.


    Dave

      WWW::Mechanize::Sleepy handles playing nice.
Re: Cool Uses, re-used
by Your Mother (Archbishop) on Nov 14, 2004 at 18:40 UTC
    Could the for( ) loop be replaced by a while(< >) contruct?
    use IO::All; # needs IO::All::LWP for this to work my $io = io->http('http://www.perlmonks.org/'); while ( my $line = $io->getline ) { print $line; }
Re: Cool Uses, re-used
by elwarren (Priest) on Nov 15, 2004 at 20:20 UTC
    the for loop could be rewritten as:
    my $path = "http://www.perlmonks.org/?node_id=1044;next="; foreach my $i (0..20) { my $url = $path . 15 * $i; }
    Going over 300 doesn't error out, it returns empty and prompts for a new node. So I'm not sure how you could do a while construct unless you checked the page length returned, and saw it was smaller than a page with listings.

    You may prefer using the print format instead of standard layout by adding displaytype to your url  http://www.perlmonks.org/?displaytype=print;node_id=1044;next=100

    As the others have said, you shouldn't pound on the server, and you really should use the is_success(get($url)) before concatenating to your file. And you're not going to get any of the "Read More..." text from these either.

    But it's interesting :-) I read through all the CUFP when I first discovered Perl Monks.

    Cheers