in reply to Win32::OLE and Word checkbox characters

G'day Cody Fendant,

I can comment on the "characters" part. I'm not an MSWin user, so I'm unable to help with the "Win32::OLE and Word" part.

'... all I get, ..., is (hex) C2A0, which I believe is just "non-breaking space".'

C2 is LATIN CAPITAL LETTER A WITH CIRCUMFLEX (Â). A0 is NO-BREAK SPACE ( ). You can see both in the PDF: Unicode Code Chart: C1 Controls and Latin-1 Supplement.

C2A0 () is in the PDF: Unicode Code Chart: Hangul Syllables. There are no formal names shown for any characters in that block of Unicode characters (AC00–D7AF).

My gut feeling is that this is related to different encodings in the Word and HTML documents. Another monk may be able to help further with that. If you supplied some code showing the conversion from Word to HTML you might get a better answer.

— Ken

Replies are listed 'Best First'.
Re^2: Win32::OLE and Word checkbox characters
by choroba (Cardinal) on Apr 05, 2019 at 12:45 UTC
    C2 A0 is the UTF-8 encoding of U+00A0 NO-BREAK SPACE:
    $ echo -e '\xc2\xa0'| perl -Mcharnames=:full -CI -wnE 'say charnames:: +viacode(ord)' NO-BREAK SPACE

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]