in reply to Re: Is utf8, ascii ?
in thread Is utf8, ascii ?

I see.
I'm new to these encode stuff, but now I understand... check, guess try to encode, if not discard.
At the moment I want just to discard, later when I have time will do more tests
But my next question was... if I check for valid utf8 string and discard. Will this discard the string if it is ascii ?

Replies are listed 'Best First'.
Re^3: Is utf8, ascii ?
by clinton (Priest) on Aug 07, 2007 at 19:43 UTC
    No. U+0000 to U+007F (the first 128 Unicode characters) are represented in UTF8 by one byte - the same byte that is used in ASCII. So ASCII (7 bit ASCII, not eg ISO-8859-* or WINDOWS-1252) is a subset of UTF8.