Help extracting tables from an html file

bhuber has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to extract table information from a file I have and thus far I'm getting nothing. Most likely a newb problem. Any ideas? Here's the code-

#!/usr/bin/perl

my $FILENAME = "file.html";
open (FILE, $FILENAME);
use HTML::TableContentParser;

$table_cp = HTML::TableContentParser->new;
$tables = $table_cp->parse($FILENAME);

my $info = $tables->[0];

foreach $row (@{$info->{rows}})
{
    foreach $col (@{$row->{cols}})
    {
        $data = $col->{data};
        print "$data\n";
    }
}
[download]

Comment on Help extracting tables from an html file Download Code

Replies are listed 'Best First'.
Re: Help extracting tables from an html file by talexb (Chancellor) on Nov 16, 2007 at 21:28 UTC
It looks like your code is a lot like the example in the HTML::TableContentParser documentation. Unfortunately, there are enough differences that your version doesn't work. The second line of code opens the file you're interested in, but you never read from the file. Later on, you pass the filename into the method that's supposed to be parsing HTML. I don't think `file.html` is valid HTML, let along a table in HTML, so I'm not surprised you get nothing. Start again, read the documentation carefully, and call the module as the documentation directs. let us know how you get along. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l]

Replies are listed 'Best First'.

Re: Help extracting tables from an html file
by talexb (Chancellor) on Nov 16, 2007 at 21:28 UTC

It looks like your code is a lot like the example in the HTML::TableContentParser documentation.

Unfortunately, there are enough differences that your version doesn't work. The second line of code opens the file you're interested in, but you never read from the file. Later on, you pass the filename into the method that's supposed to be parsing HTML. I don't think file.html is valid HTML, let along a table in HTML, so I'm not surprised you get nothing.

Start again, read the documentation carefully, and call the module as the documentation directs. let us know how you get along.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

[reply]
[d/l]