kalyanrajsista has asked for the wisdom of the Perl Monks concerning the following question:

Hello all

I'm trying to extract data from HTML Tables.

If I've specified the headers in object creation, do I need to mention

$ts[0]->rows

explicitly

or is there any otherway to get the data, since I'm trying to match only one table with the headers specified...

use strict; use HTML::TableExtract; #Content contains actual HTML code extracted from webpage my $content; # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ #Please assume that content has some data # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ my $te = HTML::TableExtract->new(headers => ['Name', 'Place', 'Country +', 'Telephone']); my @ts = $te->parse($content)->tables; my @data = $ts[0]->rows; print Dumper(@data);
$VAR1 = [ 'Justin', 'California', undef, '12345' ]; $VAR2 = [ 'Catherine', 'Texas', 'USA', '2419422' ];

Am I doing anything wrong

Replies are listed 'Best First'.
Re: Extract HTML data
by stefbv (Priest) on Dec 10, 2009 at 16:16 UTC
    If I understood the question right, than, (from the docs):
    # Shorthand...top level rows() method assumes the first table found i +n # the document if no arguments are supplied. foreach $row ($te->rows) { print join(',', @$row), "\n"; }
    Then the answer might be:
    use strict; use warnings; use HTML::TableExtract; use Data::Dumper; my $html_string = qq{ <html> <head> <title>test</title> </head> <body> <table> <tr><th>Name</th><th>Place</th><th>Country</th><th>Telephone</th></tr> <tr><td>Justin</td><td>California</td><td></td><td>12345</td></tr> <tr><td>Catherine</td><td>Texas</td><td>USA</td><td>2419422</td></tr> </table> </html> }; my $te = HTML::TableExtract->new( headers => ['Name', 'Place', 'Country', 'Telephone'], ); $te->parse($html_string); print Dumper($te->rows);
      My intention is to retrieve the data directly as Array or Array's rather than 0th or 1st element (which itself is an array) of an array..