in reply to Re^4: One bird, two Unicode names
in thread One bird, two Unicode names

Spreadsheet::ReadSXC uses XML::Parser which properly decodes.

$ perl -CSDA -MEncode -MXML::Parser -E'XML::Parser->new(Handlers => { +Char => sub { print "$_[1]" } })->parse(encode($ARGV[0], qq{<?xml ver +sion="1.0" encoding="$ARGV[0]"?><root>\xC9ric\n</root>}));' iso-8859- +1 Éric $ perl -CSDA -MEncode -MXML::Parser -E'XML::Parser->new(Handlers => { +Char => sub { print "$_[1]" } })->parse(encode($ARGV[0], qq{<?xml ver +sion="1.0" encoding="$ARGV[0]"?><root>\xC9ric\n</root>}));' UTF-8 Éric

Could you provide me the output from either of the following

use Devel::Peek; Dump($s);

or

{ use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper($s)); }

(preferably the former) for both versions of the string?

Update: Looks like you already did. I followed up there.

Replies are listed 'Best First'.
Re^6: One bird, two Unicode names
by RCH (Sexton) on Mar 14, 2011 at 16:02 UTC
    I'm not sure if you still want this or not. Here, fwiw, is output of your use Devel::Peek
    1 of 2 Here comes $s, the contents of cell at row 636, column 2 of fil +e .../BirdLists_in_english/AERC WPlist July 2010 version 2.0.ods $s = Güldenstädt’s Redstart SV = PV(0x201601c) at 0x2020ca0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) PV = 0x468ffe4 "G\303\274ldenst\303\244dt\342\200\231s Redstart"\0 [ +UTF8 "G\x{fc}ldenst\x{e4}dt\x{2019}s Redstart"] CUR = 26 LEN = 27 SV = PVMG(0x460a77c) at 0x2020ca0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x45ff86c "G\303\274ldenst\303\244dt\342\200\231s Redstart"\0 [ +UTF8 "G\x{fc}ldenst\x{e4}dt\x{2019}s Redstart"] CUR = 26 LEN = 175 MAGIC = 0x418ce64 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 22 2 of 2 Here comes $s, the contents of cell at row 763, column 8 of fil +e .../BirdLists_in_both_languages/53174_Liste_Pal_OccO2008.ods $s = Güldenstädt's Redstart SV = PVMG(0x460a77c) at 0x2020ca0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x27bd014 "G\303\274ldenst\303\244dt's Redstart"\0 [UTF8 "G\x{f +c}ldenst\x{e4}dt's Redstart"] CUR = 24 LEN = 779 MAGIC = 0x280913c MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 22

    RichardH
    Update
    Summary
    Here are the summarized results of
    "use Devel::Peek;"

    WITH:- use open ':std', ':encoding(cp1252)';
    File AERC*.ods :-
    PV = 0x34b5cf4 "G\303\274ldenst\303\244dt's Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt's Redstart"]

    File Pal_*.ods :-
    PV = 0x34b5cf4 "G\303\274ldenst\303\244dt's Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt's Redstart"]

    WITHOUT:-
    File AERC*.ods :-
    PV = 0x45fc33c "G\303\274ldenst\303\244dt\342\200\231s Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt\x{2019}s Redstart"]

    File Pal_*.ods :-
    PV = 0x4660024 "G\303\274ldenst\303\244dt's Redstart"\0 [UTF8 "G\x{fc}ldenst\x{e4}dt's Redstart"]

    Conclusion
    To remove differences between OOorg codings
    include the line
    "use open ':std', ':encoding(cp1252)';"