The problem is that you messed up byte context with character context. Just adding one line, your demo code can be easily fixed to demo the right result:

#!/usr/bin/perl -lw $a="à"; # A high latin1 character, doesn't even need unicode print '$a Normal substr: ', ord(substr($a,0,1)); { use bytes; print '$a Bytes substr: ', ord(substr($a,0,1)); } { use bytes;#I added this $b = $a . chr(256); } chop $b; print '$a equals $b, but $b is internally in UTF8' if $a eq $b; print '$b Normal substr: ', ord(substr($b,0,1)); { use bytes; print '$b Bytes substr: ', ord(substr($b,0,1)); }

This gives:

$a Normal substr: 224 $a Bytes substr: 224 $a equals $b, but $b is internally in UTF8 $b Normal substr: 224 $b Bytes substr: 224

update after read thospel's reply:

thospel, my point is not to argue with you about the encoding or representation. The point is that, you tried to use your demo to disapprove "use bytes", but it actually did the opposite, and proved "use bytes" is alright. In your case, the first byte of $a and $b are different, and Perl did printed different ord, so it proved that "use bytes" is just fine.

All what the OP asked is how to safely get the first byte, and "use bytes" is one of the correct way to do it. I just don't get how your big lesson on encoding is related to the original question. By reading the original post, to me, the author does not sounds like someone has no idea about all the encoding stuff, my feeling is that he knows quite a lot, otherwise he would not even ask the right question.

Your demo on "use bytes" simply cannot be used to disapprove "use bytes", and is misleading in general.


In reply to Re: Re: How do I safely, portably extract one or more bytes from a string? by pg
in thread How do I safely, portably extract one or more bytes from a string? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.