This is just a gentle heads up that may prevent someone else spending a long time trying to track down sporadic mismatches when comparing or searching strings containing arbitrary binary data.
If like me, you read this from the use bytes pod
The use bytes pragma disables character semantics for the rest of the lexical scope in which it appears. no bytes can be used to reverse the effect of use bytes within the current lexical scope.Perl normally assumes character semantics in the presence of character data (i.e. data that has come from a source that has been marked as being of a particular character encoding). When use bytes is in effect, the encoding is temporarily ignored, and each string is treated as a series of bytes.
to mean that any string comparisons or searches taking place with the auspices of use bytes would be exempt from unicode considerations, they aren't if the regex engine is involved!
Whether this is by design (why?) or oversight (amazing!), it is possible to search a string and get matches at apparently random places that simply defy explanation, until you start looking at the data in terms of characters and not bytes. Very confusing, especially when you've taken the precaution of placing the code in a use bytes block..
In reply to Warning: Unicode bytes! by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |