in reply to Re: How do I safely, portably extract one or more bytes from a string?
in thread How do I safely, portably extract one or more bytes from a string?
The problem is that you messed up byte context with character context. Just adding one line, your demo code can be easily fixed to demo the right result:
#!/usr/bin/perl -lw $a="à"; # A high latin1 character, doesn't even need unicode print '$a Normal substr: ', ord(substr($a,0,1)); { use bytes; print '$a Bytes substr: ', ord(substr($a,0,1)); } { use bytes;#I added this $b = $a . chr(256); } chop $b; print '$a equals $b, but $b is internally in UTF8' if $a eq $b; print '$b Normal substr: ', ord(substr($b,0,1)); { use bytes; print '$b Bytes substr: ', ord(substr($b,0,1)); }
This gives:
$a Normal substr: 224 $a Bytes substr: 224 $a equals $b, but $b is internally in UTF8 $b Normal substr: 224 $b Bytes substr: 224
update after read thospel's reply:
thospel, my point is not to argue with you about the encoding or representation. The point is that, you tried to use your demo to disapprove "use bytes", but it actually did the opposite, and proved "use bytes" is alright. In your case, the first byte of $a and $b are different, and Perl did printed different ord, so it proved that "use bytes" is just fine.
All what the OP asked is how to safely get the first byte, and "use bytes" is one of the correct way to do it. I just don't get how your big lesson on encoding is related to the original question. By reading the original post, to me, the author does not sounds like someone has no idea about all the encoding stuff, my feeling is that he knows quite a lot, otherwise he would not even ask the right question.
Your demo on "use bytes" simply cannot be used to disapprove "use bytes", and is misleading in general.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: How do I safely, portably extract one or more bytes from a string?
by thospel (Hermit) on Nov 29, 2003 at 06:05 UTC |