Extract HTML data

kalyanrajsista has asked for the wisdom of the Perl Monks concerning the following question:

Hello all

I'm trying to extract data from HTML Tables.

If I've specified the headers in object creation, do I need to mention

$ts[0]->rows
[download]

explicitly

or is there any otherway to get the data, since I'm trying to match only one table with the headers specified...

use strict;
use HTML::TableExtract;

#Content contains actual HTML code extracted from webpage
my $content;

# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#Please assume that content has some data
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

my $te = HTML::TableExtract->new(headers => ['Name', 'Place', 'Country
+', 'Telephone']);

my @ts = $te->parse($content)->tables;

my @data = $ts[0]->rows;

print Dumper(@data);
[download]

$VAR1 = [
          'Justin',
          'California',
          undef,
          '12345'
        ];
$VAR2 = [
          'Catherine',
          'Texas',
          'USA',
          '2419422'
        ];
[download]

Am I doing anything wrong

Comment on Extract HTML data Select or Download Code

Replies are listed 'Best First'.
Re: Extract HTML data by stefbv (Priest) on Dec 10, 2009 at 16:16 UTC
If I understood the question right, than, (from the docs): `# Shorthand...top level rows() method assumes the first table found i +n # the document if no arguments are supplied. foreach $row ($te->rows) { print join(',', @$row), "\n"; }` [download] Then the answer might be: use strict; use warnings; use HTML::TableExtract; use Data::Dumper; my $html_string = qq{ <html> <head> <title>test</title> </head> <body> <table> <tr><th>Name</th><th>Place</th><th>Country</th><th>Telephone</th></tr> <tr><td>Justin</td><td>California</td><td></td><td>12345</td></tr> <tr><td>Catherine</td><td>Texas</td><td>USA</td><td>2419422</td></tr> </table> </html> }; my $te = HTML::TableExtract->new( headers => ['Name', 'Place', 'Country', 'Telephone'], ); $te->parse($html_string); print Dumper($te->rows); [download]	[reply] [d/l] [select]
Re^2: Extract HTML data by kalyanrajsista (Scribe) on Dec 11, 2009 at 11:11 UTC
My intention is to retrieve the data directly as Array or Array's rather than 0th or 1st element (which itself is an array) of an array..	[reply]