eshwar has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Need some help. I have opened a webpage in IE using Win32::IE:: Mechanize. I want to save the opened web page but the $ie->content() function returns only the HEAD. I need the whole HTML file. Is there a way of doing this?

Thanks,
Eshwar.
  • Comment on Can a whole web page be saved using Perl?

Replies are listed 'Best First'.
Re: Can a whole web page be saved using Perl?
by leocharre (Priest) on Aug 14, 2009 at 17:10 UTC
Re: Can a whole web page be saved using Perl?
by ikegami (Patriarch) on Aug 14, 2009 at 17:21 UTC
    It seems to me it should return the entire page. Are you sure there is more than the head to this page? Unfortunately, my Windows machine is down at the moment.
Re: Can a whole web page be saved using Perl?
by vitoco (Hermit) on Aug 14, 2009 at 18:19 UTC
    use Win32::IE::Mechanize qw( ); my $mech = Win32::IE::Mechanize->new(); my $url = 'http://www.vitoco.cl/test-ref'; $mech->get($url); print $mech->content();

    It is strange: three consecutive runs gave:

    C:\test>test-ie.pl <HTML><HEAD><TITLE>Referrer test</TITLE></HEAD> <BODY></BODY></HTML> C:\test>test-ie.pl <HTML><HEAD><TITLE>Referrer test</TITLE></HEAD> <BODY><IMG alt=img1 src="img1"> <IMG alt=img2 src="img2"> </BODY></HTM +L> C:\test>test-ie.pl <HTML><HEAD><TITLE>Referrer test</TITLE></HEAD> <BODY><IMG alt=img1 src="img1"> <IMG alt=img2 src="img2"> </BODY></HTM +L>

    Only the first one had an empty body. Also, tags were uppercased and linebreaks chopped out.

    From the docs: "WARNING: This is a work in progress...", last update in 2005(?).

    Did you try WWW::Mechanize instead?

Re: Can a whole web page be saved using Perl?
by Marshall (Canon) on Aug 15, 2009 at 00:18 UTC
    First web servers can and do send web pages based upon the browser that you claim to be!

    Here is a simple thing that claims to be a VERY stupid browser. This prints the HTML sent.

    #!/usr/bin/perl -w use strict; use LWP::UserAgent; my $url ='http://www.fcc.gov'; my $ua = LWP::UserAgent->new or die "Problem with the new UserAgent\n"; $ua->agent("Mozilla/4.76 [en] (Windows NT 5.0; U)"); print "And now I'm calling myself ", $ua->agent( ), "!\n"; my $response = $ua->get($url) or die "Problem with the get $url\n"; $response->is_success or die "Failed to GET '$url': ", $response->status_line; my $html_page = $response->content( ); print $html_page;
    UPDATE: I don't claim to know how to work "mechanize", but the above will print HTML from a site that is "UP" most of the time.
Re: Can a whole web page be saved using Perl?
by Anonymous Monk on Aug 15, 2009 at 00:36 UTC
    Please post code that demonstrates this problem or it didn't happen :)

    This works for me

    # this is what $agent->content does print "outerHTML \n\n", $agent->Document->documentElement->{outerHTML},"\n"; # this returns same content print "innerHTML \n\n", $agent->Document->documentElement->{innerHTML},"\n";
      Sorry for the late reply. Was tied up with other things. I cant use the WWW::Mechanize because it does not allow javascript execution and the $ie->content() returns me the following

      <HTML><HEAD><LINK rel=stylesheet href="c1.css"></HEAD></HTML>

      Sorry for my ignorance but does'nt the below code need a LWP package rather than Win32::IE::Mechanize?

      $agent->Document->documentElement->{innerHTML}

      Thanks,
      Eshwar
        That is not code that demonstrates your problem. content works as advertised for me.
        my $ie = Win32::IE::Mechanize->new ... my $agent = $ie->agent;