in reply to Re^4: MS Access Input -> Japanese Output
in thread MS Access Input -> Japanese Output
Sorry for hijacking your thread, graff, but I think the problem lies in the inner workings of Jcode's getcode() function, which fails to identify UCS-2 under certain circumstances. An example:
my $a = "\x{3042}"; # Hiragana 'a' show_info($a); # UTF-8 my $a_cp932 = encode("cp932", $a); show_info($a_cp932); my $a_ucs2le = encode("ucs2le", $a); show_info($a_ucs2le); my $a_ucs2be = encode("ucs2be", $a); show_info($a_ucs2be); sub show_info { my $s = shift; my $hex = unpack("H*", $s); my $enc = getcode($s); print "hex = $hex\n"; print "enc = $enc\n\n"; }
This prints (comments added)
hex = e38182 enc = utf8 # OK hex = 82a0 enc = sjis # OK hex = 4230 enc = ascii # wrong hex = 3042 enc = ascii # wrong
As we can see, the latter two UCS-2 strings are incorrectly identified as "ascii"...
Well, if you think about it, how should the function's heuristics tell apart the single-char UCS-2 strings from their regular two-char ASCII interpretations (i.e. "0B" == "\x30\x42" or "B0" == "\x42\x30")?
Personally, I'd just look at the raw byte sequences. Sometimes, "less is more" ;)
|
|---|