I am trying to parse an XML document with UTF-8 characters using XML::Twig, but I can't seem to deal with the encoding properly.
use strict; use warnings; use XML::Twig; my $xml = do { local $/; <DATA>; }; #my $conv = XML::Twig::encode_convert( 'latin1'); # failed my $conv = 'latin1'; # failed my $twig = XML::Twig->new( input_filter => $conv ); $twig->parse( $xml ); __DATA__ <?xml version = '1.0' encoding = 'UTF-8'?> <Text>5CH (the BACKSLASH ý\ý in ISO-IR 6) shall</Text>
I don't necessarily need to preserve the unicode characters (conversion or masking is fine).
I read the fine manual and fired off a Super Search or two, but I'm still stuck. Please enlighten me. :-)
In reply to XML::Twig and UTF-8 by bobf
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |