in reply to Re^2: One bird, two Unicode names
in thread One bird, two Unicode names
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: One bird, two Unicode names
by Anonymous Monk on Mar 11, 2011 at 10:11 UTC | |
My file:///C:/Perl/html/site/lib/Spreadsheet/ReadSXC.html suggests this That correctly forces most of the file that seems to be in latin-1 into UTF-8, at least for the lower code points, for example But it fails on the higher code points e.g. "'" in latin-1 does not (unsurprisingly) turn into RIGHT SINGLE QUOTATION MARK (8217 ) Instead, the latin-1 turns into this which does not equal the name of the same bird in the UTF-8 coded file Richard H | [reply] [d/l] [select] |
by ikegami (Patriarch) on Mar 11, 2011 at 20:10 UTC | |
Spreadsheet::ReadSXC uses XML::Parser which properly decodes.
Could you provide me the output from either of the following
or
(preferably the former) for both versions of the string? Update: Looks like you already did. I followed up there. | [reply] [d/l] [select] |
by RCH (Sexton) on Mar 14, 2011 at 16:02 UTC | |
RichardH Update Summary Here are the summarized results of "use Devel::Peek;" WITH:- use open ':std', ':encoding(cp1252)'; File AERC*.ods :- PV = 0x34b5cf4 "G\303\274ldenst\303\244dt's Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt's Redstart"] File Pal_*.ods :- PV = 0x34b5cf4 "G\303\274ldenst\303\244dt's Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt's Redstart"] WITHOUT:- File AERC*.ods :- PV = 0x45fc33c "G\303\274ldenst\303\244dt\342\200\231s Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt\x{2019}s Redstart"] File Pal_*.ods :- PV = 0x4660024 "G\303\274ldenst\303\244dt's Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt's Redstart"] Conclusion To remove differences between OOorg codings include the line "use open ':std', ':encoding(cp1252)';" | [reply] [d/l] [select] |