in reply to 8-bit Clean XML Data I/O?

It seems that when you write your document you'll have to have settled on UTF-8. You could encode your binary data to be unicode safe and then unescape it afterward. There isn't any "binary-data" character set in XML so the straight write:binary/read:utf8 cycle won't work. You could just toss your XML parser and handle your input yourself.

Replies are listed 'Best First'.
Re: Re: 8-bit Clean XML Data I/O?
by samtregar (Abbot) on Feb 20, 2004 at 23:22 UTC
    You could encode your binary data to be unicode safe and then unescape it afterward.

    How do I encode arbitrary 8-bit character data as UTF-8 so that I can get it back again when I read it?

    You could just toss your XML parser and handle your input yourself.

    After all the time I put into getting XML::Validator::Schema working, I can't imagine going that route.

    Thanks,
    -sam

      There's no magic encoding here - you have to either use a pre-existing encode/decode or write your own. You might just use base64.
        Huh. It's a possibility. One problem I have with it is how much harder it would be to debug. Right now I can open up an XML file and find problems by simple inspection. With all my data in base64 I'd have to process the XML before I could read it. Which is pretty hard if the XML parser won't parse it, for example!

        I wonder if I could make a sub-class of XML::Writer which Base64 encoded strings containing non-UTF-8 characters, and prefixed them with some kind of marker so I'd know to reverse the encoding when reading. Of course, then I'd need a sub-class of XML::Simple to get it back out again. Good lord, what a hack.

        -sam