Re^3: Question on extracting HTML tables with HTML::TableExtract

We appear to read the 2 line para on keep_html differently. IMO, it's open to multiple readings, but the next para (re strip...) seems more to support your view than mine.

So, I bet we can agree without dissent that the doc needs some improvement.

That said, let's think about Parse::HTML. It won't do the whole job, by itself, for OP (but as noted above, we don't know how OP is doing whatever led to the SOPW), but with no added code... and just a little additional, and a few tweaks to the code in the example entitled "The Identity Parser, it should be no great problem achieve OP's objective.

Comment on Re^3: Question on extracting HTML tables with HTML::TableExtract

Replies are listed 'Best First'.
Re^4: Question on extracting HTML tables with HTML::TableExtract by bitingduck (Deacon) on Feb 26, 2012 at 20:07 UTC
Clarity in module docs has always been the thing that bugged me the most about learning Perl, so I end up doing quite a bit of experimental programming. That said, I use Perl mostly because of CPAN-- anything I want to do, someone else has mostly solved already. I liked this problem because when I did it myself a long time ago for a scraper (that's been running a few times a week for a few years) I did it the brute force way with a regex and identifying the text around it that tells me it's the table I want. Then it goes into HTML::Treebuilder to get the data I want. I was going to suggest using Treebuilder, until I read the docs, which had the example almost written already.	[reply]

Replies are listed 'Best First'.

Re^4: Question on extracting HTML tables with HTML::TableExtract
by bitingduck (Deacon) on Feb 26, 2012 at 20:07 UTC

Clarity in module docs has always been the thing that bugged me the most about learning Perl, so I end up doing quite a bit of experimental programming. That said, I use Perl mostly because of CPAN-- anything I want to do, someone else has mostly solved already. I liked this problem because when I did it myself a long time ago for a scraper (that's been running a few times a week for a few years) I did it the brute force way with a regex and identifying the text around it that tells me it's the table I want. Then it goes into HTML::Treebuilder to get the data I want. I was going to suggest using Treebuilder, until I read the docs, which had the example almost written already.

[reply]