Are you really positive that "file.txt" actually contains at least one occurrence of "short_japanese_text_in_utf8", and that is really is encoded in sjis? (And are you being careful about checking for distinct white-space characters that might mess up the comparisons?)
I'm not familiar with Japanese, but I wonder if there might be a problem because of the way Unicode handles this language. Because Japanese and Chinese (and Korean) use a number of "common" ideographic characters, Unicode has created a "unified CJK" character set, which represents, roughly, the union of ideographs used in the three languages. It's conceivable that some "ambiguities" might exist in the sjis-unicode mappings, such that one utf code point could reasonbly be used in place of two distinct sjis code points, or vice-versa.
This could mean that two strings would look "the same" when viewed by a casual human reader, despite the use of one or another distinct code point.
In reply to Re: Unicode problem
by graff
in thread Unicode problem
by edis
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |