in reply to optimizing a parser running HTML::TableExtract to fetch only some labels and values [row by row]
From the documentation for HTML::TableExtract, emphasis added:
rows()Return all rows within a matched table. Each row returned is a reference to an array containing the text, HTML, or reference to the HTML::Element object of each cell depending the mode of extraction. Tables with rowspan or colspan attributes will have some cells containing undef. Returns a list or a reference to an array depending on context.
You need to decide how to handle the cases where cells span rows or columns. If you just want to ignore it, then you can use grep to filter them out, eg print "a cell: $_\n" for grep {defined} @$rows;.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: optimizing a parser running HTML::TableExtract to fetch only some labels and values [row by row]
by codeacrobat (Chaplain) on Dec 19, 2010 at 21:55 UTC | |
Re^2: optimizing a parser running HTML::TableExtract to fetch only some labels and values [row by row]
by Perlbeginner1 (Scribe) on Dec 19, 2010 at 22:04 UTC |