in reply to Re: URL page only retrieves partially
in thread URL page only retrieves partially

I am looking to see the data (specifically earnings data) produced by the web address http://www.earnings.com/fin/earnListing.jsp?tckr=&exch=&eff=&date=2003-05-04 minus whitespace, so that I can further edit and format it and then save it in CSV format on my computer for analysis.
  • Comment on Re: Re: URL page only retrieves partially

Replies are listed 'Best First'.
Re: Re: Re: URL page only retrieves partially
by Abstraction (Friar) on May 09, 2003 at 02:57 UTC
    If you want to remove all the whitespace in $message, something like this will work.
    # There may be a better regex for this... $message =~ s/\s+//g;
    But it looks like you just want the data from the ' Earnings Announcements' table. If this is the case have a look at modules like HTML::TableContentParser or HTML::TableExtractor.

      Thank you for your suggestions. Your idea for $message works fine, but I thought I'd follow-up on your suggestions for the CPAN modules -- actually I'm trying to use the module TableExtract. Here is the tentative code I have built so far, but I do not know how to proceed. Can you guide me to the way I can complete the code so I can refer to the table data (for the rows Company, Symbol, EPS Estimate, EPS Actual)? I hope to print all rows in the table for all 4 columns.

      #!/usr/bin/perl
      use warnings;
      use strict;
      use LWP::Simple;
      use HTML::TableExtract;
      my ($te, $url);
      $url="http://www.earnings.com/fin/earnListing.jsp?tckr=&exch=&eff=&date=2003-05-04";
      $te = new HTML::TableExtract( headers =>
      qw(Company Symbol EPS Estimate EPS Actual) );
      $te->parse($url);

      With many thanks, Joe

      Thanks a lot for your code suggestion. That is definitely the direction I want to go. By the way, I'm very intrigued by the your suggestions for using the modules you mentioned. As I understand, they are available on CPAN, correct? Are there any clear instructions on their installation and usage? Again, many thanks.
        As I understand, they are available on CPAN, correct?

        Yup, http://search.cpan.org


        Are there any clear instructions on their installation and usage?

        It depends on what OS you are using. On Unix/Linux, something along the lines of perl -MCPAN -e shell will get you started. If you are on Windows, there is an application named PPM in the bin directory of your perl install. After that it's generally as easy as doing a install Desired::Module.


        Good Luck.