To get an answer to that, you'd have to tell us what is at line 2, column 25, byte 68 of your input. (And likely some of the surrounding text as well.)

Sorry, that's what I get for trying to reply at that hour. I have replicated the problem now by downloading your source and running recode utf8..latin1 on it.

The problem is with the XML parser before it ever actually reaches Twig -- the twig encoding filters are to convert parsed information from one encoding to another, but they can't actually affect the parsing itself. The issue is that your xml declaration has told the parser to expect one encoding, but it has received another. In other words, if you had:

__DATA__ <?xml version = '1.0' encoding = 'iso-8859-1'?> <Text>5CH (the BACKSLASH ý\ý in ISO-IR 6) shall</Text>
... and you were absolutely certain that your source file had latin-1 encoding, you wouldn't have to mess with input filters at all. This would be sufficient to deal with it:
my $twig = XML::Twig->new(); $twig->parse( $xml );

If you later recoded that file to utf-8 (via something like recode latin1..utf8 filename), you might have problems with the charset again, though odds are that it would actually parse and give you garbage. THEN you might need to play with an input filter, not to get the parsing working, but to convert the garbage you got out of it to what you wanted.


In reply to Re^3: XML::Twig and UTF-8 by AZed
in thread XML::Twig and UTF-8 by bobf

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.