Re^2: Question on extracting HTML tables with HTML::TableExtract

Reading the same docs it looks like keep_html keeps any markup within the cell (e.g. text formatting) but not the tags that define the table. That it doesn't keep embedded tables within a cell makes me think it removes all table stucture tags

The synopsis does give a different method that looks like the right thing through:

$table_html = $table_tree->as_HTML;

I haven't tried either one- maybe later on today if the op doesn't solve it.

Comment on Re^2: Question on extracting HTML tables with HTML::TableExtract Download Code

Replies are listed 'Best First'.
Re^3: Question on extracting HTML tables with HTML::TableExtract by ww (Archbishop) on Feb 26, 2012 at 19:48 UTC
We appear to read the 2 line para on keep_html differently. IMO, it's open to multiple readings, but the next para (re strip...) seems more to support your view than mine. So, I bet we can agree without dissent that the doc needs some improvement. That said, let's think about Parse::HTML. It won't do the whole job, by itself, for OP (but as noted above, we don't know how OP is doing whatever led to the SOPW), but with no added code... and just a little additional, and a few tweaks to the code in the example entitled "The Identity Parser, it should be no great problem achieve OP's objective.	[reply]
Re^4: Question on extracting HTML tables with HTML::TableExtract by bitingduck (Deacon) on Feb 26, 2012 at 20:07 UTC
Clarity in module docs has always been the thing that bugged me the most about learning Perl, so I end up doing quite a bit of experimental programming. That said, I use Perl mostly because of CPAN-- anything I want to do, someone else has mostly solved already. I liked this problem because when I did it myself a long time ago for a scraper (that's been running a few times a week for a few years) I did it the brute force way with a regex and identifying the text around it that tells me it's the table I want. Then it goes into HTML::Treebuilder to get the data I want. I was going to suggest using Treebuilder, until I read the docs, which had the example almost written already.	[reply]