in reply to Re: Another use bytes and length issue
in thread Another use bytes and length issue

You guys are AWESOME! All your comments helped give me insight into this issue I've been pondering for a few months. Thanks to you guys I've found a solution! In particular, Joost mentioned that use bytes messes up utf8 encoding flags, and graff pointed me to the perl docs more utf8 byte thinking. Somehow this time this got me thinking in a different direction which led me to this:
http://ahinea.com/en/tech/perl-unicode-struggle.html.
In particular, check out this:
$ustring2 = pack "U0C*", unpack "C*", $ustring2;

Whenever I did $string = chr(20000), the length function counted 1 character. When I had input from a form, the length function counted in bytes. Somehow my form input function was messing up the flags.

Apparently use bytes may have been working within scope, but the string we were analyzing had messed up utf8 flags. Putting a no bytes before and a use bytes after my few lines of counting characters keeps everything working super.

So why do we want to use bytes? Our application counts the number if bytes in a message, wraps it in a wrapper, and tells some other component how many bytes to expect. If the count is wrong the message is rejected and components get shut down. We were uploading files with Chinese characters in the filename and the application seemed to shut down. We traced this problem to the length function counting characters instead of bytes. Since most string functions we use deal with byte counts, we'd like to use bytes and make character counts the exception.
  • Comment on Re^2: Another use bytes and length issue