I think it's a good point about the desired output encoding. I'm only reading the html to produce another html file, so it would suit me just as well to read the text raw without interpreting the html codes. Is there something similar to get_text that just delivers the text without interpreting it first? I've not found anything like that when reading about TokeParser.

It may be that the problem can be solved by looking at character encodings as people have suggested, but in case that falls through, I'd also like to look at the possibility of reading uninterpreted html.

Thanks for your help


In reply to Re^2: HTML::TokeParser, get_text scrambling rsquo and lsquo by tridral
in thread HTML::TokeParser, get_text scrambling rsquo and lsquo by tridral

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.