in reply to Encoding: my custom encoding fails on one character but works for everything else?!
I tried to replicate your situation, just doing the accented "A" characters from latin1 along with adding the dotless i, and I got the same results you got -- trying to decode from "{i}" to "\x{0131}" gave me an empty string, while everything else worked as expected.
I noticed that if the test string for decoding was "{i} " (note the space after the close-curly), it worked just fine (and didn't lose the space, either; any other character in that position would work as well).
Then I added one other code point using one character between curlies to see if that would behave the same way -- inverted q-mark / "{?}" -- and when this was in the ucm file, both the q-mark and the dotless i worked fine without further ado (no extra character needed in the test string).
So, I can't explain it (maybe some other monk can), but see if that works for you:
<code_set_name> "daud" <mb_cur_min> 1 <mb_cur_max> 4 <subchar> \x3F CHARMAP <U0000> \x00 |0 # NULL ... #<U007B> \x7B |0 # LEFT CURLY BRACKET <U007C> \x7C |0 # VERTICAL LINE #<U007D> \x7D |0 # RIGHT CURLY BRACKET ... # I included the next line, defining "{?}": <U00BF> \x7b\x3f\x7d |0 # INVERTED QUESTION MARK <U00C0> \x7b\x41\x60\x7d |0 # LATIN CAPITAL LETTER A WITH GRAVE ... <U0131> \x7b\x69\x7d |0 # LATIN SMALL LETTER DOTLESS I END CHARMAP
UPDATE: Regarding the issue of commenting out the curlies in the ucm file (U007b, U007d), this actually seems to me like a Good Idea™ in its own right. If some poor typist, trying to keyboard text using Daud encoding, happens to put curlies around a character or digraph that is not defined in your ucm file, a decode from that into unicode will yield "\x{fffd}..\x{fffd}" because those particular curlies cannot be decoded (and whatever was between them will be left unchanged). It's just good to know for sure how to identify errors of this kind.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Encoding: my custom encoding fails on one character but works for everything else?!
by herveus (Prior) on Sep 14, 2009 at 11:55 UTC | |
by graff (Chancellor) on Sep 14, 2009 at 14:08 UTC | |
by herveus (Prior) on Sep 14, 2009 at 16:03 UTC |