What does decode_utf8 do above check for UTF8 compliance and set the utf8 flag? Does it pack 4 octets per 32 bits for binary and one character per 32 bits for utf8 data.

decode_utf8 converts bytes "\xC3\xA9" into character "\xE9".

Internally, the string returned is the utf8 representation (a Perl-specific superset of UTF-8) of the character with the UTF8 flag on. For example, character "\xE9" is stored as the two bytes "\xC3\xA9", UTF8=1.

If the data read in is in binary format, then why did I have to use `use bytes' when searching it with an re (including the searching of binary data).

You don't.

At the time this made sense but now I'm having to convert the data to UTF8 I'm wondering well if it isn't already in utf8 then surely it's binary and then why the need for use bytes

You're unclear as to whether you're talking about the internal or external encoding. Perhaps Re: Decoding, Encoding string, how to? (internal encoding) would help.


In reply to Re: Locale Responsibilities by ikegami
in thread Locale Responsibilities by aecooper

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.