"There are numeric properties of utf8 that are quite distinctive (very unlikely to occur in other types of data) ..."
Well, yes, you can usually say that if something decodes OK as utf8, it probably *is* utf8. But it *will* also be a valid chunk of extended ASCII, or any other charset that makes use of all 256 possibilities for each octet (not elegantly put, but I hope you see my point).
And probably is not the same as *is*. Is it really a problem in practice? I'm not sure. Maybe not. Hey, I'm just asking, OK? I like imagining things all going wrong - it's my job ... ;-)