The designers of XML have saved you from yourself. You aren't even allowed to send formfeed in XML so you are just crazy thinking XML would allow something so insane as sending binary data! Be glad the XML designers had your best interests in mind! If not for their keen insight and concern, you'd be sending binary data already and boy would you soon regret it!

As I note in Re: Funny characters in nodes (exactly zero), Tim Bray declared "XML dislikes [...] form-feed[s] [etc.] which have exactly zero shared semantics from system to system". Yes you'll never find two systems in the world that both use "form feed" to represent a page break.

So you need to either invent your own, proprietary encoding for the binary data and encode the binary data into XML-approved characters (to ensure "shared semantics", oh the irony) and then teach every party involved this new proprietary encoding. Or, you could just find one of the many "XML parsers" (the scare quotes are required by the XML standard) that have the good sense to at least optionally ignore the requirements that they complain about characters that Tim Bray dislikes (something that XML 1.1 will also likely mostly do).

If you can't find such an "XML parser", then you could also just use a simplistic scheme to transform the "not well-formed 'XML'" into XML and then transform all parsed-out values to recover the original binary data. For example, replace any control characters (or other XML-hated characters) and any backslashes with \xx where "xx" is the hex value of the byte (I don't think there are any Unicode characters that XML hates that won't fit in one byte) and then perform the reverse translation on the extracted values.

- tye        


In reply to Re: binary data in XML (semantics) by tye
in thread binary data in XML by sailortailorson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.