Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi , How can i retrieve web page data using perl .

20030720 Edit by Corion: Changed title from "Perl script"

  • Comment on Perl script to retrieve a webpage using perl

Replies are listed 'Best First'.
Re: Perl script to retrieve a webpage using perl
by Corion (Patriarch) on Jul 20, 2003 at 12:22 UTC

    The easiest way to retrieve a webpage using Perl is to use the LWP::Simple module :

    use strict; use LWP::Simple; my $page = get 'http://www.example.com'; print $page;

    Another way is to use the wget executable, if you have it installed :

    use strict; my $url = 'http://www.example.com'; my $page = `wget -q -O - "$url"`; print $page;

    If you want even more interaction with the page, take a look at WWW::Mechanize. If you want to parse the page after retrieving it to extract data, take a look at HTML::TableExtract and/or HTML::Parser.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
      my $url = 'http://www.example.com';
      my $page = `wget -q -O - "$url"`;

      This way it is ok. But note that if the contents of $url comes from an untrusted source (e.g. a field in a form or part of a URL), then simply calling wget with the parameter listed, is very dangerous.

      Consider what would happen if $url would be '"; find /"'. Then consider what would happen if someone would call a program other than "find".

      Liz

        You can make the use of wget secure by using the shell's quoting mechanism and environment variables.

        You can also use the open(WGET,"|-") construct with exec to do this safely.

        I agree in general that this is a less safe approach, but if it's the only option it can be done safely.