in reply to optimizing a parser running HTML::TableExtract to fetch only some labels and values [row by row]

From the documentation for HTML::TableExtract, emphasis added:

rows()

Return all rows within a matched table. Each row returned is a reference to an array containing the text, HTML, or reference to the HTML::Element object of each cell depending the mode of extraction. Tables with rowspan or colspan attributes will have some cells containing undef. Returns a list or a reference to an array depending on context.

You need to decide how to handle the cases where cells span rows or columns. If you just want to ignore it, then you can use grep to filter them out, eg print "a cell: $_\n" for grep {defined} @$rows;.

  • Comment on Re: optimizing a parser running HTML::TableExtract to fetch only some labels and values [row by row]
  • Download Code

Replies are listed 'Best First'.
Re^2: optimizing a parser running HTML::TableExtract to fetch only some labels and values [row by row]
by codeacrobat (Chaplain) on Dec 19, 2010 at 21:55 UTC
    ... or disable the warnings no warnings 'uninitialized';

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
Re^2: optimizing a parser running HTML::TableExtract to fetch only some labels and values [row by row]
by Perlbeginner1 (Scribe) on Dec 19, 2010 at 22:04 UTC


    Well. probably i can do the following, to get rid of the uninitialized value warnings?
    Some of the table cells are empty so we can do a test for them or filter them out. Like this for example:

    foreach my $table ( $te->tables ) { foreach my $row ($table->rows) { my @values = grep {defined} @$row; print " ", join(',', @values), "\n"; } }


    Well another thing we can do: we could also outright and disable warnings for this particular blocks with no warnings ' uninitialized', but well it is generally not a good practice.

    Watcha think !?