Re: URL page only retrieves partially

Replies are listed 'Best First'.
Re: Re: URL page only retrieves partially by canguro (Novice) on May 09, 2003 at 02:33 UTC
I am looking to see the data (specifically earnings data) produced by the web address http://www.earnings.com/fin/earnListing.jsp?tckr=&exch=&eff=&date=2003-05-04 minus whitespace, so that I can further edit and format it and then save it in CSV format on my computer for analysis.	[reply]
Re: Re: Re: URL page only retrieves partially by Abstraction (Friar) on May 09, 2003 at 02:57 UTC
If you want to remove all the whitespace in `$message`, something like this will work. `# There may be a better regex for this... $message =~ s/\s+//g;` [download] But it looks like you just want the data from the ' Earnings Announcements' table. If this is the case have a look at modules like HTML::TableContentParser or HTML::TableExtractor.	[reply] [d/l] [select]
Re: Re: Re: Re: URL page only retrieves partially by canguro (Novice) on May 09, 2003 at 09:10 UTC
Thank you for your suggestions. Your idea for $message works fine, but I thought I'd follow-up on your suggestions for the CPAN modules -- actually I'm trying to use the module TableExtract. Here is the tentative code I have built so far, but I do not know how to proceed. Can you guide me to the way I can complete the code so I can refer to the table data (for the rows Company, Symbol, EPS Estimate, EPS Actual)? I hope to print all rows in the table for all 4 columns. #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; my ($te, $url); $url="http://www.earnings.com/fin/earnListing.jsp?tckr=&exch=&eff=&date=2003-05-04"; $te = new HTML::TableExtract( headers => qw(Company Symbol EPS Estimate EPS Actual) ); $te->parse($url); With many thanks, Joe	[reply]
Re: Re: Re: Re: URL page only retrieves partially by canguro (Novice) on May 09, 2003 at 04:01 UTC
Thanks a lot for your code suggestion. That is definitely the direction I want to go. By the way, I'm very intrigued by the your suggestions for using the modules you mentioned. As I understand, they are available on CPAN, correct? Are there any clear instructions on their installation and usage? Again, many thanks.	[reply]
Re: Re: Re: Re: Re: URL page only retrieves partially by Abstraction (Friar) on May 09, 2003 at 04:43 UTC