The OP is not introducing unicode or mentioning his locale anywhere in his code. The scalars coming from the OP's socket will have byte semantics. Why would any scalars be upgraded to unicode in his code? OP claims his length() return is the number of bytes in $datagram. $datagram isnt utf marked. He didn't say he is using -C.
Comment on Re^3: Pattern match not working sometimes
unpack makes it fairly clear that the code is dealing with octet (byte) oriented data without need for any further context. substr implies string handling with the possibility for utf/other encoding confusion. It's not that substr is flat out wrong in the context, just that it doesn't send as clear a message as unpack or the use of \C in a regex.