it's alright to be biased

I do like the idea off being up to date as much as possible, I sometimes have the suspicious feeling that the PERL community can't get up pace with all the changes anyways. There still isn't one single package that does XSLT 2.0 and XPath 2.0 and so on. Partly we rely on libxml2, which is not goin to get an update to the next level.

I managed to get HTML::TreeBuilder::XPath working and playing around with it at the moment. Getting the right text from the HTML source with XPath is quite a struggle anyways, resulting frequnetly in errors... but... I get the grips and it feels more confident then running regex's on the source, especially since some parts consists of more then one <p>-elements. ->findvalues()does do a nice trick. Only need to get rid off the nasty cp1252 codes that slipped into a iso-8859-1 encoded html, the € symbol isn't part of it

I do not want to have a war between the monks, but please enlighten me more on why to use HTML5 instead of TreeBuilder


In reply to Re^2: extracting data from HTML by Jurassic Monk
in thread extracting data from HTML by Jurassic Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.