samtregar has asked for the wisdom of the Perl Monks concerning the following question:
not well-formed (invalid token) at line 514, column 188, byte 72499 at /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/XML/Parser.pm line +185
I get a few dozen of these. All the bytes they're pointing to are high-ASCII characters of some sort. I'm guessing this means I need to do something special to output clean UTF-8. Somehow I thought XML::Writer would take care of that, but I guess not.
My first attempt was to use Unicode::Map8 to translate the input data from Latin1 (a guess at the character set) to UTF-8. That didn't work. So I tried the umap utility, which I've used successfully in similar circumstances before:
umap latin1:utf8 < data.old > data.new
But XML::Parser doesn't like data.new any better than data.old.
So I come to the monks, on bended knee. I'd be happy to get anything from a new debugging technique or an RTFM link to an outright solution. Thanks!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Generating UTF-8 from nasty high ASCII input
by grantm (Parson) on Jul 10, 2002 at 11:14 UTC | |
by samtregar (Abbot) on Jul 10, 2002 at 16:37 UTC | |
|
Re: Generating UTF-8 from nasty high ASCII input
by IlyaM (Parson) on Jul 10, 2002 at 10:24 UTC | |
by samtregar (Abbot) on Jul 10, 2002 at 16:33 UTC | |
|
Re: Generating UTF-8 from nasty high ASCII input
by Joost (Canon) on Jul 10, 2002 at 13:38 UTC | |
by samtregar (Abbot) on Jul 10, 2002 at 16:32 UTC |