Re: Retrieving HTML with LWP::UserAgent

I'm not able to replicate your problem here. The URL you said was returning no content is working just fine; it is a little slow, however. Perhaps the problem is in the calling code; how is it dealing with the content? How have you determined it has no content?

In addition, the parens on your subroutine declaration are wrong, sub retrieve_url () {...} declares retrieve_url as taking no arguments; the parens there are a prototype (see perlsub). You should be using sub retrieve_url { ... }. You probably haven't seen any errors from this because the declaration isn't seen soon enough, but you should be seeing warnings about that (you are using warnings, right?).

Comment on Re: Retrieving HTML with LWP::UserAgent Select or Download Code

Replies are listed 'Best First'.
Re: Re: Retrieving HTML with LWP::UserAgent by pg (Canon) on May 27, 2004 at 01:31 UTC
This reply is very impressive. Just to extend a little bit on this. There are two ways, we might call this sub. We either call it before it is defined, or call it after it is defined. Let's look at both of them: Call before it is defined. `use LWP::UserAgent; use strict; use warnings; print retrieve_url( 'http://www.finasta.lt' ); sub retrieve_url() { my $url = shift; my $ua = LWP::UserAgent->new(); my $res = $ua->get( $url ); if ( $res->is_success() ) { my $content = $res->content(); $content =~ s!\A\s+!!; return( $content ); } else { die( "retrieval error: ", $res->status_line() ); } }` [download] In this case, you would be warned that: `main::retrieve_url() called too early to check prototype at a.pl line +6.` [download] However, Perl will close one eye and let the code run "successfully". Call after the sub is defined. If you want to get rid of that annoying warning, you have to define the sub first, then Perl will stop the program from running, which is ideal to me. `use LWP::UserAgent; use strict; use warnings; sub retrieve_url() { my $url = shift; my $ua = LWP::UserAgent->new(); my $res = $ua->get( $url ); if ( $res->is_success() ) { my $content = $res->content(); $content =~ s!\A\s+!!; return( $content ); } else { die( "retrieval error: ", $res->status_line() ); } } print retrieve_url( 'http://www.finasta.lt' );` [download] Try it, and you get: `Too many arguments for main::retrieve_url at a.pl line 23, near "'http +://www.fi asta.lt' )" Execution of a.pl aborted due to compilation errors.` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Re: Retrieving HTML with LWP::UserAgent
by pg (Canon) on May 27, 2004 at 01:31 UTC

This reply is very impressive.

Just to extend a little bit on this. There are two ways, we might call this sub. We either call it before it is defined, or call it after it is defined. Let's look at both of them:

Call before it is defined.

use LWP::UserAgent;

use strict;
use warnings;

print retrieve_url( 'http://www.finasta.lt' );

sub retrieve_url() {
    my $url = shift;

    my $ua = LWP::UserAgent->new();
    my $res = $ua->get( $url );

    if ( $res->is_success() ) {
        my $content = $res->content();
        $content =~ s!\A\s+!!;

        return( $content );
    }
    else {
        die( "retrieval error: ", $res->status_line() );
    }
}
[download]

In this case, you would be warned that:

main::retrieve_url() called too early to check prototype at a.pl line 
+6.
[download]

However, Perl will close one eye and let the code run "successfully".

Call after the sub is defined. If you want to get rid of that annoying warning, you have to define the sub first, then Perl will stop the program from running, which is ideal to me.

use LWP::UserAgent;

use strict;
use warnings;

sub retrieve_url() {
    my $url = shift;

    my $ua = LWP::UserAgent->new();
    my $res = $ua->get( $url );

    if ( $res->is_success() ) {
        my $content = $res->content();
        $content =~ s!\A\s+!!;

        return( $content );
    }
    else {
        die( "retrieval error: ", $res->status_line() );
    }
}

print retrieve_url( 'http://www.finasta.lt' );
[download]

Try it, and you get:

Too many arguments for main::retrieve_url at a.pl line 23, near "'http
+://www.fi
asta.lt' )"
Execution of a.pl aborted due to compilation errors.
[download]

[reply]
[d/l]
[select]