Thanks, that clarifies some things. Yet, the python code does not mix text and binary.
As far as I read the code, the binary stuff is BASE64 encoded. Well, yes, unfortunately "encoding" is used for a lot of things.
Let me try to explain the difference:
- UTF-8 is an encoding to map unicode characters to bytes. Unicode characters are identified by their code point. For ASCII characters and control characters like LINE FEED, their code point is equal to their "traditional" byte value, and also to their UTF-8 mapping. Perl's interface identifies characters by its code point, so you get a lowercase greek alpha by chr(945) or by "\x{3b1}". You also can use the names as in choroba's example: "\N{GREEK SMALL LETTER ALPHA}".
- BASE64 is an encoding to map a stream of bytes, each of which in the range 0..255 ("binary data"), to a stream of bytes, each of which representing an ASCII character, The result happens to be valid UTF-8 (see above).
Binary data will in most cases contain bytes in the range 128..255. Their UTF-8 encoding is not equal to their byte value. If you encode such bytes in UTF-8, it is like Perl interpreting their byte values as code points: Unicode has code points in that range with (not so) surprising similarity to ISO-8859-1. The code point for ö is U+00F6, but its UTF-8 encoding has two bytes X'C3B6'. So, if you encode binary data in UTF-8, the result is different, the process is deterministic and it is reversible.
However, it depends on the receiving side to do a decoding of an UTF-8 stream into binary data and not into a unicode string. Perl happens to do that (because, as you wrote, it makes no difference), but not many other languages do. In general, you can not decode an UTF-8 stream into binary if it contains one or more characters with a code point greater than 255.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.