Re^4: A Regex for no-break space Unicode Entities

utf8 sequences are completely distinguishable; the bytes \302\240 are not a subset of any other utf8 character. This is true for any utf8 sequence.

The possibilities for surprise I saw were perl ending up making other changes if the file contained invalid utf8 or characters not represented in the shortest possible sequence of bytes, or perl giving warnings.

Comment on Re^4: A Regex for no-break space Unicode Entities