in reply to Encoding Detection
Likewise, English text may be either ASCII or UTF-8 in some cases with no interpretation difference.
I'd suggest the following, which is not optimal but wins for simplicity:
This suggests an efficient algorithm; interleave a UTF-8 validator with a 0xDF detector; if you can assume your input is what you say it is, you have a fast one-pass function.
References:
Unicode's official notion of ISO-8859-8:
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-8.TXT
Windows-1255 reference (Unicode):
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1255.TXT
Windows-1255 reference (Microsoft): http://www.microsoft.com/typography/unicode/1255.htm
(Credit to some of the research behind this to Anatoly Vorobey.)
|
|---|