uday_sagar has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am new in dealing with the web pages via perl. Can you help me with a simple perl construct which takes the URL as argument and gives a file having the source code of the web page?

Thanks.

  • Comment on Displaying the source code of a web page

Replies are listed 'Best First'.
Re: Displaying the source code of a web page
by davido (Cardinal) on Jun 19, 2012 at 05:47 UTC

    A few options (your question wasn't specific enough as to narrow down what you're after):

    perl -MLWP::Simple=getprint -e 'getprint("http://perlmonks.org");' >fi +lename_source.html

    ...or...

    mojo get perlmonks.org >filename_source.html

    ...or...

    perl -Mojo -E 'say g("perlmonks.org")->dom->html'

    The first construct requires LWP::Simple, and the second and third, Mojolicious (which is probably only advisable if you have some other reason to have it on your system).

    If you're on a Unix/Linux system you might already have curl installed.

    If you want to incorporate it into a larger script:

    use strict; use warnings; use LWP::Simple qw(get); my $raw_page = get( 'http://perlmonks.org' ); open my $html_ofh, '>', 'filename.txt' or die $!; print {$html_ofh} $raw_page; close $html_ofh or die $!;

    Other notables include WWW::Mechanize, LWP::UserAgent, WWW::Mechanize::Firefox, ... and a whole bunch of HTML parsers and link extractors that you can find by visiting your favorite CPAN search tool.


    Dave

Re: Displaying the source code of a web page
by frozenwithjoy (Priest) on Jun 19, 2012 at 05:33 UTC

    I would take a look at the modules listed in Task::Kensho::WebCrawling and choose the most appropriate one for your needs.

    It can be as easy as:
    use LWP::Simple; getprint "http://www.dot.com";
Re: Displaying the source code of a web page
by ansh batra (Friar) on Jun 19, 2012 at 05:35 UTC

    if using Linux
    use wget command

    $url=<>; chomp($url); print "$url"; system("wget $url");

    read wget manual for refining this code according to your needs

      If you're going to suggest shelling out to wget, at least take the shell out of the equation:

      system("wget", $url);