My version of word (on macosx) behaves similarly to what linuxer described in the first reply above: the binary storage of the characters in the doc file appears to be UTF-16LE, so that ASCII characters have a null high-byte.

The various "wingding" characters have their keyboard character as the low byte and 0xF0 as the high byte. My installation actually had 3 distinct sets of "wingdings", and each of the three used the exact same numeric range: \x{f021} - \x{f07e} -- even though each of the three sets display a different set of symbols.

That numeric code-point range (\x{f0..}) turns out to be the Unicode "private use" area, which means that Microsoft gets to do whatever they want with those code points. It looks like there must be information elsewhere in the doc file to specify character offset ranges within the text, associating the different regions of the text with different fonts.

As a result, a given code point value in the "\x{f0..}" range could be any of several different symbols, depending on what font is associated with that particular position in the text. If you change a character from "\x{f021}" to "\x{0021}" or vice-versa, you will be exchanging a wingding or other special symbol with a standard ascii character.

As for why your substitutions are not working, it's because you assumed that M$ would have done the "logically right" thing, using Unicode code point values for the wingding symbols that are provided in Unicode. Alas, it seems that this is too much to expect. You have to work out the mapping of the M$ wingding characters to their keyboard keys, map the corresponding ascii code points into the "\x{f0..}" range, and hope for the best.

(But you might still run into trouble with the way that fonts are mapped to ranges of text. Maybe there's a way to manage that from the perl script? I don't know...)

Good luck with that.

(updated to put code tags around the hex code point strings)


In reply to Re: Perl Word and Wingdings by graff
in thread Perl Word and Wingdings by merrymonk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.