Actually, the other way around. It converts UTF-8 encoded characters to plain 8-bit numbers, for numbers in the range 0x80 through 0xFF inclusive. It ignores anything outside that range—anything lower is already ASCII, and anything higher is left unchanged, and would leave incorrect stuff in the string.
Yes, the output is Latin-1, because Unicode's first 256 code points are identical to Latin-1.
In reply to Re: Re: Re: regex for utf-8
by John M. Dlugosz
in thread regex for utf-8
by jjohhn
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |