The OP is not introducing unicode or mentioning his locale anywhere in his code. The scalars coming from the OP's socket will have byte semantics. Why would any scalars be upgraded to unicode in his code? OP claims his length() return is the number of bytes in $datagram. $datagram isnt utf marked. He didn't say he is using -C.