Re: Question on extracting HTML tables with HTML::TableExtract

Sorry, I misread this: ~~http://www.perl.com/pub/2003/09/17/perlcookbook.html~~ -- it's NOT relevant to retaining the html markup.

So, back to HTML::TableExtract -- the documentation makes it very clear that keep_html SHOULD keep the markup.

Thus, the question becomes, could there be an error in your code? We won't know the answer to that until you post it... and a bit of information about how you're acquiring the table in the first place.

Comment on Re: Question on extracting HTML tables with HTML::TableExtract Download Code

Replies are listed 'Best First'.
Re^2: Question on extracting HTML tables with HTML::TableExtract by bitingduck (Deacon) on Feb 26, 2012 at 19:22 UTC
Reading the same docs it looks like keep_html keeps any markup within the cell (e.g. text formatting) but not the tags that define the table. That it doesn't keep embedded tables within a cell makes me think it removes all table stucture tags The synopsis does give a different method that looks like the right thing through: `$table_html = $table_tree->as_HTML;` I haven't tried either one- maybe later on today if the op doesn't solve it.	[reply] [d/l]
Re^3: Question on extracting HTML tables with HTML::TableExtract by ww (Archbishop) on Feb 26, 2012 at 19:48 UTC
We appear to read the 2 line para on keep_html differently. IMO, it's open to multiple readings, but the next para (re strip...) seems more to support your view than mine. So, I bet we can agree without dissent that the doc needs some improvement. That said, let's think about Parse::HTML. It won't do the whole job, by itself, for OP (but as noted above, we don't know how OP is doing whatever led to the SOPW), but with no added code... and just a little additional, and a few tweaks to the code in the example entitled "The Identity Parser, it should be no great problem achieve OP's objective.	[reply]
Re^4: Question on extracting HTML tables with HTML::TableExtract by bitingduck (Deacon) on Feb 26, 2012 at 20:07 UTC
Clarity in module docs has always been the thing that bugged me the most about learning Perl, so I end up doing quite a bit of experimental programming. That said, I use Perl mostly because of CPAN-- anything I want to do, someone else has mostly solved already. I liked this problem because when I did it myself a long time ago for a scraper (that's been running a few times a week for a few years) I did it the brute force way with a regex and identifying the text around it that tells me it's the table I want. Then it goes into HTML::Treebuilder to get the data I want. I was going to suggest using Treebuilder, until I read the docs, which had the example almost written already.	[reply]