merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

Now Christmas is over I am returning to my problem of editing word documents with Perl to replace existing
single characters or strings in Word documeents.
Several Monks gave very helpful comments that have allowed me to progress.
However I cannot get ‘ticks’ and other wingding symbols in my altered word document,
I like using the Perl lines
$search-> {Text} = $oldtext; $replace-> {Text} = $newtext; $search-> Execute({Replace => wdReplaceAll});
since these preserve the font (and all other characteristics) of the characters that are identical to whatever is in $oldtext.
I also am comfortable with using definitions of the old text such as "\x{00BD}".
The problem comes with any characters from wingdings.
As a test I created a Word document that contained only a ‘pencil’ pointing from top right to bottom left.
According to Word, this has a hex value of 0021. I then used the following three lines of Perl hoping to get the next wingding characters (scissors)
$search-> {Text} = "\x{0021}"; $replace-> {Text} ="\x{0022}"; $exec_res = $search-> Execute({Replace => wdReplaceAll});
$exec_res returned a value of 0 and the replacement had failed.
Searching the net I found a site (http://www.alanwood.net/demos/wingdings.html) that was headed
Wingdings character set and equivalent Unicode characters.
This showed that the Unicode Hex values for the pencil and scissors are U+270F and U+2702 respectively.
Therefore I altered the search and replace lines to
$search-> {Text} = "\x{270F}"; $replace-> {Text} = "\x{2702}";
and as this did not work even tried
$search-> {Text} = 'U' . "\x{270F}"; $replace-> {Text} ='U' . "\x{2702}";
which also failed. How can I overcome this failure?

In my net searches I have found other instances where people have had problems with wingding and similar characters. Sadly none of these gave the answer I was looking for.

Replies are listed 'Best First'.
Re: Perl Word and Wingdings
by linuxer (Curate) on Dec 31, 2008 at 10:36 UTC

    I assume this is a successor of 730863 (Just to link it).

    Just an idea on this. Try to open the Word document in an Hexeditor and check what values are stored for the characters.

    My test with a Word Document (Word 2003 SP3), showed up like this (at address 0x0a00):

    0000a00: 4800 6100 6c00 6c00 6f00 3a00 20f0 21f0 H.a.l.l.o.:. .!. 0000a10: 0d00 0000 0000 0000 0000 0000 0000 0000 ................

    The Document contained the String "Hallo: !" with "Hallo: " formatted with font "Bitstream Vera Sans Mono" and the "!" formatted with "Wingdings".

    Maybe this helps you with your search.

      Thank you for this reply. You are correct this is a continuation of 730863.
      I have now pursued the idea of using a hex dump tool to get the value of wingding characters. I must say that ‘graff’ also suggest this approach.
      Looking at your results and what I found using the HexTookit (from the same stable as the hex dump tool od) I came to the conclusion the hex value for the ‘pencil’ (the ! using the wingding font) is 21F0.
      Therefore I used the following Perl lines (2BF0 is an wingding envelope that started as a + character) on a Word document that contained a Wingding ‘pencil’.
      $search-> {Text} = "\x{21F0}"; $replace-> {Text} ="\x{2BF0}"; $exec_res = $search-> Execute({Replace => wdReplaceAll});
      Sadly this failed. Has any wise monk any idea of what I should try next?
Re: Perl Word and Wingdings
by jhourcle (Prior) on Dec 31, 2008 at 15:02 UTC
    Searching the net I found a site (http://www.alanwood.net/demos/wingdings.html) that was headed Wingdings character set and equivalent Unicode characters. This showed that the Unicode Hex values for the pencil and scissors are U+270F and U+2702 respectively.

    Not every font has every glyph in it -- so it's quite likely that your second attempt will not work using the wingdings font which places its glyphs in lower slots. Likewise, just because chinese/arabic/cyrillic characters exist in unicode doesn't mean that they'll be in a given font.

    In MacOS X, you can see which fonts have a given unicode character defined by going to the Character Palette (look in Edit -> Special Characters ...), and look in the 'Font Variations' panel (you might need to expand it), and it'll show you which fonts have that given glyph -- in my case, 270F is only in Zaph Dingbats, but 2702 is in Zaph Dingbats and Hiragino. If I use any other font, I get a place holder to mark it as an unknown character.

    I have no idea how to do similar lookups on other operating systems

Re: Perl Word and Wingdings
by graff (Chancellor) on Dec 31, 2008 at 15:53 UTC
    My version of word (on macosx) behaves similarly to what linuxer described in the first reply above: the binary storage of the characters in the doc file appears to be UTF-16LE, so that ASCII characters have a null high-byte.

    The various "wingding" characters have their keyboard character as the low byte and 0xF0 as the high byte. My installation actually had 3 distinct sets of "wingdings", and each of the three used the exact same numeric range: \x{f021} - \x{f07e} -- even though each of the three sets display a different set of symbols.

    That numeric code-point range (\x{f0..}) turns out to be the Unicode "private use" area, which means that Microsoft gets to do whatever they want with those code points. It looks like there must be information elsewhere in the doc file to specify character offset ranges within the text, associating the different regions of the text with different fonts.

    As a result, a given code point value in the "\x{f0..}" range could be any of several different symbols, depending on what font is associated with that particular position in the text. If you change a character from "\x{f021}" to "\x{0021}" or vice-versa, you will be exchanging a wingding or other special symbol with a standard ascii character.

    As for why your substitutions are not working, it's because you assumed that M$ would have done the "logically right" thing, using Unicode code point values for the wingding symbols that are provided in Unicode. Alas, it seems that this is too much to expect. You have to work out the mapping of the M$ wingding characters to their keyboard keys, map the corresponding ascii code points into the "\x{f0..}" range, and hope for the best.

    (But you might still run into trouble with the way that fonts are mapped to ranges of text. Maybe there's a way to manage that from the perl script? I don't know...)

    Good luck with that.

    (updated to put code tags around the hex code point strings)

      Many thanks for the additional information. As far as I can see I will have to experiment to get to the solution.
      As a start I used the following three lines on a file with a ! in it.
      $search-> {Text} = "\x{0021}"; $replace-> {Text} ="\x{f021}"; $exec_res = $search-> Execute({Replace => wdReplaceAll});
      This ‘worked’ and I was hoping to get a single wingding character. However I got three which the HexToolkit gave the following output following hex values EF 00 AC 20 A1 00 (these looked like an i with an umlaut, a Euro symbol and an i). How can I get just the one wingding that I want?
      This is really an update to my earlier reply.
      I too have three wingding set in Word.
      I created a file with the first one from each set and then used the HexToolkit to find the hex values.
      The output for the three characters was 28 28 28. That is all the same and only one byte.
      I do not know if this helps or hinders but it does suggest that this is not simple!