in reply to Perl script to retrieve a webpage using perl

The easiest way to retrieve a webpage using Perl is to use the LWP::Simple module :

use strict; use LWP::Simple; my $page = get 'http://www.example.com'; print $page;

Another way is to use the wget executable, if you have it installed :

use strict; my $url = 'http://www.example.com'; my $page = `wget -q -O - "$url"`; print $page;

If you want even more interaction with the page, take a look at WWW::Mechanize. If you want to parse the page after retrieving it to extract data, take a look at HTML::TableExtract and/or HTML::Parser.

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Replies are listed 'Best First'.
Re: Re: Perl script to retrieve a webpage using perl
by liz (Monsignor) on Jul 20, 2003 at 12:30 UTC
    my $url = 'http://www.example.com';
    my $page = `wget -q -O - "$url"`;

    This way it is ok. But note that if the contents of $url comes from an untrusted source (e.g. a field in a form or part of a URL), then simply calling wget with the parameter listed, is very dangerous.

    Consider what would happen if $url would be '"; find /"'. Then consider what would happen if someone would call a program other than "find".

    Liz

      You can make the use of wget secure by using the shell's quoting mechanism and environment variables.

      You can also use the open(WGET,"|-") construct with exec to do this safely.

      I agree in general that this is a less safe approach, but if it's the only option it can be done safely.