in reply to Locale Responsibilities
What does decode_utf8 do above check for UTF8 compliance and set the utf8 flag? Does it pack 4 octets per 32 bits for binary and one character per 32 bits for utf8 data.
decode_utf8 converts bytes "\xC3\xA9" into character "\xE9".
Internally, the string returned is the utf8 representation (a Perl-specific superset of UTF-8) of the character with the UTF8 flag on. For example, character "\xE9" is stored as the two bytes "\xC3\xA9", UTF8=1.
If the data read in is in binary format, then why did I have to use `use bytes' when searching it with an re (including the searching of binary data).
You don't.
At the time this made sense but now I'm having to convert the data to UTF8 I'm wondering well if it isn't already in utf8 then surely it's binary and then why the need for use bytes
You're unclear as to whether you're talking about the internal or external encoding. Perhaps Re: Decoding, Encoding string, how to? (internal encoding) would help.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Locale Responsibilities
by aecooper (Acolyte) on May 24, 2009 at 15:41 UTC | |
by ikegami (Patriarch) on May 25, 2009 at 17:15 UTC |