triticale has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to download the following webpage:

http://www.nasdaq.com/earnings/earnings-calendar.aspx?date=2014-Oct-02

Firstly I used LWP::Simple, but did not get anything downloaded. Then I used WWW::Mechanize, and finally got a file created. But when I looked into the file when opening it, I saw nothing but weird letters. The following are the codes I used.

use WWW::Mechanize;

$url = "http://www.nasdaq.com/earnings/earnings-calendar.aspx?date=2014-Oct-02";

$local_file_name = 'getkml.txt';

$mech = WWW::Mechanize->new;

$mech->get( $url, ":content_file" => $local_file_name );

My questions are:

1. Should I convert the file to a plain text file? How may I reach it?

2. Is there an alternative way to download this webpage?

  • Comment on Problem saving webpage content into a file

Replies are listed 'Best First'.
Re: Problem saving webpage content into a file
by Corion (Patriarch) on Oct 06, 2014 at 12:36 UTC

    The headers for the URL say

    Encoding: gzip

    So, for the short term fix, you will have to use gunzip to uncompress the saved data.

    In the long run, you will have to learn HTTP and learn how the headers and the content interact. Also, maybe using the ->content method of WWW::Mechanize instead of saving the raw, undecoded content directly to a file helps.

Re: Problem saving webpage content into a file
by pme (Monsignor) on Oct 06, 2014 at 12:40 UTC
    Hi triticale, it may help:
    use LWP::Simple; my $url = 'http://www.nasdaq.com/earnings/earnings-calendar.aspx?date= +2014-Oct-02'; my $html = LWP::UserAgent->new; my $response = $html->get($url); if ($response->is_success) { print "$response->content\n"; } else { warn("Can't get $url -- " . $response->status_line); }
    Update: In according to Corion's answer you can gunzip this way:
    my $uncomp_content; gunzip \$response->content => \$uncomp_content;
      print "$response->content\n";

      This likely won't print what you intend. Quotes don't interpolate to method calls.

      print $response->content, "\n";

      I think that the content has already been decompressed by WWW::Mechanize for you.