in reply to Re: Character in 'b' format wrapped in unpack
in thread Character in 'b' format wrapped in unpack
when you pass a numeric value greater than 255 to chr, it must return a wide character.
There is no "must" about it. It should be the case that unless I specifically ask for Unicrap, characters should be assumed to be 8-bits.
I'm afraid I don't quite understand the reason(s) for what happens when the "use bytes" pragma is added -- if I've done it right, the only difference is to eliminate the warning message about the "wrapped character in unpack"
You're right. It does just enough to lull you into a false sense of security; then sneaks around behind and kicks you in the nuts!
Rant:
People complain about the effects that the inclusion of threads has on code for those that don't use them -- a modest increase in executable size and a few single digits of outright performance in tests deliberately designed to show it -- but the deleterious affects of the inclusion of Unicrap are far more pervasive and damaging.
Not only does it bloat the source code and executable, and hit the performance of just about every operation even when your not using it; it subtly (and often silently) changes the semantics of code that isn't even text processing; let alone Unicrap processing.
The use utf was the right way to go. Without it, byte semantics; with it, you made your own bed; so live with it.
But then some bright spark came along and decided he could make it transparent; and now we're all f*****!
Is it the case that you got the particular pattern of zeros and ones you expected, and were just complaining about the warning message?)
No. I wanted the shift to discard the high bit, as it does with integers:
$n <<= 1; print unpack 'B*', pack 'N', $n;; 10101011010101001010101101010100 $n <<= 1; print unpack 'B*', pack 'N', $n;; 01010110101010010101011010101000 $n <<= 1; print unpack 'B*', pack 'N', $n;; 10101101010100101010110101010000
Unfortunately, Unicrap (and Perl's implementation of Unicrap) conspire such that you can no longer rely upon simple byte semantics.
The idea that a string (a good old array of bytes) can suddenly contain a random Unicrap character in a program that doesn't (and doesn't want to) use any Unicrap, is a farce!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Character in 'b' format wrapped in unpack
by graff (Chancellor) on Mar 29, 2015 at 20:08 UTC | |
by BrowserUk (Patriarch) on Mar 29, 2015 at 20:49 UTC | |
by choroba (Cardinal) on Mar 29, 2015 at 22:23 UTC | |
by BrowserUk (Patriarch) on Mar 29, 2015 at 23:04 UTC | |
by ikegami (Patriarch) on Mar 29, 2015 at 23:24 UTC | |
by ikegami (Patriarch) on Mar 29, 2015 at 23:07 UTC | |
by BrowserUk (Patriarch) on Mar 30, 2015 at 00:24 UTC |