in reply to Re: Encoding is a pain.
in thread Encoding is a pain.

Why doesn't XML have a way to handle arbitrary binary data? ... A good example are the XML tickers here, there are characters possible in a node and other places that cannot be validly embedded in XML.

Make up your mind, are they characters or binary data? :-)

Certainly any character which can be represented in HTML should be representable in XML. For example in HTML you could use é for 'é'. In XML, you don't have the handy mnemonic name unless you use a DTD, but you can still represent the character as é - 'é'. The HTML::Entities module can help with the conversion.

Replies are listed 'Best First'.
Re^3: Encoding is a pain.
by steves (Curate) on Sep 25, 2004 at 13:49 UTC

    The character versus data distinction is important. XML does have a way to express non-ASCII characters using the DTD as noted. For true binary data CDATA tags almost do it, but they're not foolproof since the binary data could contain sequences that would make the tag look like it ended before it really did. But you could encode using an agreed upon scheme, such as uuencode or base64 encoding and put that in CDATA tags. Ugly, but possible.

      Actually, in XML 1.0 CDATA sections are no good for binary data even without the delimiter issue. A CDATA section is defined to contain Chars, which in turn are defined as:

      Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000- +#x10FFFF]

      So for example control characters in the range 0x00 - 0x08 are not allowed. There are also encoding issues which would prevent you putting binary bytes in CDATA.