From your examples, this isn't a problem with substr but with making sure the utf8 flag is on the data. I suspect
some XS code (either the getline or the ...) not setting
the flag correctly. (What class is that getline in? What class is $io?) You can put Encode::is_utf8($scalar)
checks through your code to figure out where its being lost,
and if necessary, do Encode::_utf_on.
Update: the suggestion of using _utf_on is a temporary workaround and is not a substitue for reporting a bug to the author if a module is not correctly handling UTF8 input.