No, I didn't mess it up, I demonstrated exactly what I wanted to demonstrate: that the same character has two different possible internal representations in perl (notice they compare eq for perl). And that these two representations give a different result for substr() under use bytes.

You can for example use Dump() from Devel::Peek to see that internally they are different of course. But code shouldn't depend on how the string happens to be encoded internally if it can be avoided.

Your example however leaves both $a and $b with the same internal representation (non-utf8), so of course they print the same. It also isn't related to my point anymore. Notice that the "but $b is internally in UTF8" isn't actually true for your code.

update after reading pg's reply

I wasn't trying to "disprove" use bytes, it obviously does what it is supposed to do. I was however trying to show that it makes the result depend on the internal representation. Our disagreement is about first if that's a good idea and second if that's what the OP wanted. You obviously think he wants the n-th byte of the internal representation of the string, while I assumed that if he's talking about unicode (which I'm still not sure of), he'd want the n-th byte of the UTF8 representation of the logical string.


In reply to Re^3: How do I safely, portably extract one or more bytes from a string? by thospel
in thread How do I safely, portably extract one or more bytes from a string? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.