in reply to Building binary strings.
However, since they represent the UTF-8 encoding of some characters, consider working with code-points like this:
When dealing with perl strings, it is helpful to keep in mind the following:use Encode; ... $x = chr(65533); $y = Encode::encode('utf-8', $x); # -> "\x{aa}\x{42}\x{fe}"
1. perl strings are just an array of numbers, and the numbers (characters) can be interpreted as either a Unicode code-points or as byte values
2. if the characters (numbers) in a string are meant to be interpreted as code-points, we call it "text" and if they are meant to be interpreted as byte values we call the string "binary data".
The point is that the string "\x{aa}\x{42}\x{fe}" can be interpreted as either three Unicode code-points (U+00AA, U+0042, U+00FE) or as three bytes (0xaa, 0x42, 0xfe), and only the programmer knows what the correct interpretation is.
Here are some examples of the difference. If a string (say $x) is meant to contain code-points, then the following usage of $x is logically incorrect even if perl does not report an error:
Conversely, if $x contains byte values, the following are incorrect uses of $x:$y = Encode::decode('some encoding', $x); binmode STDOUT, ":bytes"; print $x; ...
In these cases, perl may return a result, but the result is meaningless.$y = Encode::encode('some encoding', $x); $n = rindex($x, "\N{WHITE SMILEY FACE}"); # need: use charnames ':full +'; ...
Hope this helps. Or better yet, hope this generates some more questions :-)
|
|---|