Assuming Unicode, something that I only have passing familiarity with (which, as usually, doesn't preclude me from pontificating about it), the 2-byte characters will not have a first byte with a high-order bit of 0 (that is, the first byte of each pair will have an ASCII value between 128 and 255 -- actually, you can narrow it down further than that since larger-than-two-byte characters, for example, take up part of that range).
So something like s/[\200-\277].//gs should strip out two-byte characters.
Simply matching on /\w/ doesn't work since you will match some of the bytes that are the second half of a two-byte character.
- tye (but my friends call me "Tye")In reply to (tye)Re: Regular Expression 1 byte vs 2 byte characters
by tye
in thread Regular Expression 1 byte vs 2 byte characters
by feloniousMonk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |