in reply to 8-bit Clean XML Data I/O?
Of course you can encode the data, in Base64, or in a smarter way, if your data is mostly ascii for example. Beyond that, I don't really see how you can store data in XML if you don't know it's encoding. You would think that you could find a nice encoding that covered characters 0-256, which would allow you to parse the data, and then later figure out what to do with it. The problem is that parsers tend to want to convert what they get into utf8. At least XML::Parser and XML::LibXML do this, so if you lie about the encoding of the data, then you will get it, converted to utf8 from the wrong encoding... :--(
That said XML::Twig has a mode in which it uses the original data instead of the utf8 one. You can get that data in XML::Parser too, use the original_string method on the XML::Parser::Expat object. But you have to make sure that no matter what the real encoding is, the data will be valid for the "fake" one you declare your document to be in. I don't know enough about encodings to have a suggestion there.
But frankly, if I was dealing with sources in various encodings, I would try really hard to get them all in Unicode before trying to hack something like this.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: 8-bit Clean XML Data I/O?
by samtregar (Abbot) on Feb 21, 2004 at 00:02 UTC |