URL page only retrieves partially

canguro has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am trying to retrieve the page defined by the following URL:

http://www.earnings.com/fin/earnListing.jsp?tckr=&exch=&eff=&date=2003-05-04

However, only the last (small) portion of the page comes back. Can someone straighten me out? Here is my code:

use warnings;
use strict;
use LWP::Simple;
my $site="http://www.earnings.com/fin/earnListing.jsp?tckr=&exch=&eff=
+&date=2003-05-04";
my $content=get $site;
my $message=$content;
$message=~s/<.+?>/ /g;
print $message;
[download]

Thanks in advance. By the way, I am writing this code for my own practice. No commercial use is involved in any way.

2003-05-08 edit ybiC: <tt> for URL and <code> for code

Comment on URL page only retrieves partially Download Code

Replies are listed 'Best First'.
Re: URL page only retrieves partially by BrowserUk (Patriarch) on May 09, 2003 at 02:05 UTC
I can't reproduce your problem. I c&p'd your code straight into perl at the command prompt and ran it and the output contained everything from the content of the <TITLE> tag at the top `Earnings.com - Earnings Announcements` down to the copyright notice `1999-2002 Earnings.com, Inc., All rights reserved` and footer menu `privacy policy \|terms of service` at the bottom. There is a large amount of whitespace introduced by your regex. Sorry to ask this, but are you sure that the rest hasn't simply scrolled off the top of your screen? Have you tried redirecting the output to a file and looking at it in your editor? Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply] [d/l] [select]
Re: Re: URL page only retrieves partially by canguro (Novice) on May 09, 2003 at 03:51 UTC
You are absolutely right. I am fairly new to Perl, retrieving web pages, and using an editor, so I did not realize that my editor must have a limit on the amount of data it displays. Apparently it displays only the most recently found data, up to that limit, and as new data comes in the older data scrolls off. Thanks for your sharp idea!	[reply]
Re: URL page only retrieves partially by Abstraction (Friar) on May 09, 2003 at 01:40 UTC
Before we can help you a little more information is needed. What do you want to have in `$message` when you print it?	[reply] [d/l]
Re: Re: URL page only retrieves partially by canguro (Novice) on May 09, 2003 at 02:33 UTC
I am looking to see the data (specifically earnings data) produced by the web address http://www.earnings.com/fin/earnListing.jsp?tckr=&exch=&eff=&date=2003-05-04 minus whitespace, so that I can further edit and format it and then save it in CSV format on my computer for analysis.	[reply]
Re: Re: Re: URL page only retrieves partially by Abstraction (Friar) on May 09, 2003 at 02:57 UTC
If you want to remove all the whitespace in `$message`, something like this will work. `# There may be a better regex for this... $message =~ s/\s+//g;` [download] But it looks like you just want the data from the ' Earnings Announcements' table. If this is the case have a look at modules like HTML::TableContentParser or HTML::TableExtractor.	[reply] [d/l] [select]
Re: Re: Re: Re: URL page only retrieves partially by canguro (Novice) on May 09, 2003 at 09:10 UTC
Re: Re: Re: Re: URL page only retrieves partially by canguro (Novice) on May 09, 2003 at 04:01 UTC
Re: Re: Re: Re: Re: URL page only retrieves partially by Abstraction (Friar) on May 09, 2003 at 04:43 UTC