in reply to Re: extracting data from HTML
in thread extracting data from HTML
it's alright to be biased
I do like the idea off being up to date as much as possible, I sometimes have the suspicious feeling that the PERL community can't get up pace with all the changes anyways. There still isn't one single package that does XSLT 2.0 and XPath 2.0 and so on. Partly we rely on libxml2, which is not goin to get an update to the next level.
I managed to get HTML::TreeBuilder::XPath working and playing around with it at the moment. Getting the right text from the HTML source with XPath is quite a struggle anyways, resulting frequnetly in errors... but... I get the grips and it feels more confident then running regex's on the source, especially since some parts consists of more then one <p>-elements. ->findvalues()does do a nice trick. Only need to get rid off the nasty cp1252 codes that slipped into a iso-8859-1 encoded html, the € symbol isn't part of it
I do not want to have a war between the monks, but please enlighten me more on why to use HTML5 instead of TreeBuilder
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: extracting data from HTML
by tobyink (Canon) on Jun 03, 2012 at 19:49 UTC |