in reply to Unicode XML Parsing Problem

Microsoft, as mentioned above, has a bad habit of using the word "Unicode" when they really mean UTF-16 little endian, and thus have tricked many developers into creating XML documents that look like the one you sent, with the invalid character encoding name. With XML::LibXML, this means you can't use parse_file because you have to remove the faulty declaration first. Here's a snippet that does it (assuming $xmlfile contains the filename):
my $parser = XML::LibXML->new(); open my $in, '<:encoding(UTF-16)', $xmlfile or die $!; my $xmltext = do { local $/; <$in> }; close $in; $xmltext =~ s/encoding="Unicode"//i; my $doc = $parser->parse_string($xmltext) or die "Could not process XM +L file $xmlfile";
My own Windows app (the one mentioned on my home node) does precisely this.

Replies are listed 'Best First'.
Re^2: Unicode XML Parsing Problem
by SheridanCat (Pilgrim) on Sep 23, 2005 at 20:35 UTC
    Thanks to everyone who responded. This definitely points me in the right direction.