in reply to Perl Word and Wingdings

My version of word (on macosx) behaves similarly to what linuxer described in the first reply above: the binary storage of the characters in the doc file appears to be UTF-16LE, so that ASCII characters have a null high-byte.

The various "wingding" characters have their keyboard character as the low byte and 0xF0 as the high byte. My installation actually had 3 distinct sets of "wingdings", and each of the three used the exact same numeric range: \x{f021} - \x{f07e} -- even though each of the three sets display a different set of symbols.

That numeric code-point range (\x{f0..}) turns out to be the Unicode "private use" area, which means that Microsoft gets to do whatever they want with those code points. It looks like there must be information elsewhere in the doc file to specify character offset ranges within the text, associating the different regions of the text with different fonts.

As a result, a given code point value in the "\x{f0..}" range could be any of several different symbols, depending on what font is associated with that particular position in the text. If you change a character from "\x{f021}" to "\x{0021}" or vice-versa, you will be exchanging a wingding or other special symbol with a standard ascii character.

As for why your substitutions are not working, it's because you assumed that M$ would have done the "logically right" thing, using Unicode code point values for the wingding symbols that are provided in Unicode. Alas, it seems that this is too much to expect. You have to work out the mapping of the M$ wingding characters to their keyboard keys, map the corresponding ascii code points into the "\x{f0..}" range, and hope for the best.

(But you might still run into trouble with the way that fonts are mapped to ranges of text. Maybe there's a way to manage that from the perl script? I don't know...)

Good luck with that.

(updated to put code tags around the hex code point strings)

Replies are listed 'Best First'.
Re^2: Perl Word and Wingdings
by merrymonk (Hermit) on Dec 31, 2008 at 16:48 UTC
    Many thanks for the additional information. As far as I can see I will have to experiment to get to the solution.
    As a start I used the following three lines on a file with a ! in it.
    $search-> {Text} = "\x{0021}"; $replace-> {Text} ="\x{f021}"; $exec_res = $search-> Execute({Replace => wdReplaceAll});
    This ‘worked’ and I was hoping to get a single wingding character. However I got three which the HexToolkit gave the following output following hex values EF 00 AC 20 A1 00 (these looked like an i with an umlaut, a Euro symbol and an i). How can I get just the one wingding that I want?
Re^2: Perl Word and Wingdings
by merrymonk (Hermit) on Dec 31, 2008 at 17:27 UTC
    This is really an update to my earlier reply.
    I too have three wingding set in Word.
    I created a file with the first one from each set and then used the HexToolkit to find the hex values.
    The output for the three characters was 28 28 28. That is all the same and only one byte.
    I do not know if this helps or hinders but it does suggest that this is not simple!