in reply to Foreach Array and Html table extract

The specific error message you're getting is because you don't actually declare $line. You could do what with foreach my $line ( @lines ) {....

At some point you'll probably want to set up a parallel user agent. It's likely that your biggest bottleneck will be in fetching the documents, otherwise.

If you ask a dozen individuals how to implement a parallel user agent you'll probably get a dozen different answers. Some will include explicit use of fork or threads, while others might recommend a module that works well for them. I've used both LWP::Parallel::UserAgent, and Mojolicious's built-in Mojo::UserAgent. I think a lot more ongoing work and maintenance has gone into the latter, and since I use Mojolicious for other purposes anyway (and as it can be installed in under a minute), I lean toward the Mojo::UserAgent approach nowadays. Mojo::UserAgent combined with Mojo::IOLoop (an event loop) and Mojo::DOM (HTML/XHTML DOM parser with CSS selector support) is a powerful ally.


Dave

Replies are listed 'Best First'.
Re^2: Foreach Array and Html table extract
by doctordoctor (Initiate) on Aug 14, 2012 at 20:09 UTC

    Thank you for the quick response, that moves me further along in the code, but the error I now receive is "Can't call method "rows" on an undefined value at C:\perlscripts\test.pl line 31" <\p>

      Why don't you provide an updated snippet of code for us to play with, and some sample HTML that results in the error. Just wrap the HTML in code tags. It's easier to debug an error that we can easily reproduce.

      Also, you may want to use the 'debug' method from HTML::TableExtract to inspect the assertions your code makes about the state of affairs immediately before the call to 'rows'.


      Dave

        The code I'm sending the command prompt is the following: <\p>

        #!/usr/bin/perl use 5.014; # so push/pop/etc work on scalars (experimental) use strict; use warnings; use LWP::Simple 'get'; use HTML::TableExtract; my $file = 'C:\Payout Policy Paper\Data\urllist.csv'; open (FH, "< $file") or die "Can't open $file for read: $!"; my @lines = <FH>; close FH or die "Cannot close $file: $!"; print @lines; foreach my $line (@lines) { my $te = HTML::TableExtract->new( headers => [ 'Purchased','Average','Publicly','May'], slice_columns => 0,keep_html => 0,br_translate => 0 ); $te->parse($line); my $table = $te->first_table_found; my $file = "testout.csv"; open (F,">", $file); for my $row ($table->rows) { print F join('^', @$row), "\n"; } close (F); }

        If I'm correctly understanding your suggestion you can find an example of the html I am analyzing at:

         http://www.sec.gov/Archives/edgar/data/826083/000082608312000011/dellq1fy1310q.htm