For perl 5.6.x, thus including UTF-8 support but not coming with Encode, you can use pack. In the following, the first character ("C" or "U") indicates the type of packed string, which will be used for the whole string, the second is a character/byte count of zero. "a*" packs the actual string data.
- To mark the bytes as UTF-8:
$utf8 = pack 'U0a*', $raw;
- To mark an UTF-8 string as raw bytes:
$raw = pack 'C0a*', $utf8;
Actually, that will still work on perl 5.8.x, though the few (well hidden) functions in Encode are a valid alternative. See Messing with Perl's Internals. Somehow I get the feeling they don't like you to mess with this, yourself...
| [reply] [d/l] [select] |
Thats a good tip too, I compared the three alternate forms and here are the results:
Rate decode pack _utf8_on
decode 57971/s -- -93% -95%
pack 781250/s 1248% -- -30%
_utf8_on 1111111/s 1817% 42% --
| [reply] [d/l] |
Um, am I missing something? What's wrong with Encode::_utf8_on?
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).
| [reply] |
The docs from _utf8_on start with
_utf8_on(STRING)
[INTERNAL] Turns on the UTF-8 flag in STRING.
....
This implies that this function is not the recommended way. Or at leat, that it may change sometime. Another sign to avoid the usage is to me, that it starts with a underscore. This means 'private function' to me.
| [reply] [d/l] [select] |
Indeed. The docs say, a scant six lines (exact figure will vary depending on renderer and font size, of course) above that, "The following API uses parts of Perl's internals in the current implementation. As such, they are efficient but may change."... but I wouldn't worry too much.
These are indeed not part of the public API of the module... but what does that mean, really? It means that they may change without notice. When, is the question, though. They clearly won't change without you upgrading the module, or perl itself. This means that you should have fair warning before they change.
But /will/ they change, even then? They've given you fair warning that they may change it. Will they? I doubt it. First off, Dan Kogi isn't the sort of person (I say, at a guess) to lightly break backwards compatablity -- even when he's given you far warning that he might do so. The function is documented, even if it warns you in the same breath. But more importantly, I don't forsee a reason for it to change. Perl's unicode handling model is very unlikely to change in the near future such that setting and getting the value of the utf8 flag will no longer become a meaningful thing to do. (Such a change would, in fact, be greatly desirable, but is very unlikely before, at the very least, 5.12.)
/me hopes that wasn't too rambily or heretical.
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).
| [reply] |