Re^3: Replacing none alpha/numeric characters in a Word document

"Times" is a font, not a character encoding. The font says what the characters look like when displayed or printed; the encoding says which bit patterns (numeric values) are mapped to which characters.

Your lines ranging from "001 A001" to "256 A256" don't make any sense to me. What was the point of that? Note that "\x{01}" is an ASCII control character, likewise up through "\x{19}", and for "\x{7f}") -- these will display nothing (or could make your display do weird things). Also, please be careful that you don't confuse "backslash" \ with "forward slash" / -- these are very different things.

If using the "\x{....}" works in terms of allowing the perl script to insert any Unicode character you want into the doc file, then what more is there to worry about? That's the way to go. You just need to be able to find the hex-numeric unicode code-point value for the characters you want to insert. (That's why I pointed to that "handy tool", to provide a way to search the unicode character table.)

For example, as you should have figured out by now, "\x{200C}" is the "left single quotation mark" and "\x{200D}" is the "right single quotation mark", regardless whether you are using a Times font, or Courier, or Arial, or Helvetica, or ...

As for going "via the hex dump route", if you have "unix tools for windows", you can check out either "od" or "xxd" (though I am sure there are hexdump tools that are "native" to windows, as well). Or you can whip up something pretty easily in perl -- here's a basic/simple hexdump tool:

#!/usr/bin/perl

die "Usage:  $0  file_name\n" unless (@ARGV==1 and -f $ARGV[0]);

open( I, "<", $ARGV[0] );
binmode I;
$offset = 0;
while ( $n = read( I, $b, 16 )) {
    ( $c = $b ) =~ s/[^[:print:]]/./g;
    printf( "%08x: %-47s  %s\n", $offset,
            join( " ", map{ sprintf( "%02x", $_ )} unpack( "C*", $b ))
+,
            $c );
    $offset += $n;
}
[download]

(But really, learning to use tools like "od" or "xxd" is better.)

Comment on Re^3: Replacing none alpha/numeric characters in a Word document Select or Download Code

Replies are listed 'Best First'.
Re^4: Replacing none alpha/numeric characters in a Word document by merrymonk (Hermit) on Dec 29, 2008 at 13:24 UTC
Thank you for your comments. Now Christmas is over I am returning to my problem of getting wingding characters to replace existing single characters or strings in Word documeents. I liked using the Perl lines `$search-> {Text} = $oldtext; $replace-> {Text} = $newtext; $search-> Execute({Replace => wdReplaceAll});` [download] since these seem to preserve the font (and all other characteristics) of the characters that are identical to whatever is in $oldtext. I also am comfortable with using definitions of the old text such as "\x{00BD}". The problem comes with any characters using from wingdings. As a test I created a Word document that contained a ‘pencil’ pointing from top right to bottom left. This has a hex value of 0021. I then used the following three lines of Perl hoping to get the next wingding characters (scissors) `$search-> {Text} = "\x{0021}"; $replace-> {Text} ="\x{0022}"; $exec_res = $search-> Execute({Replace => wdReplaceAll});` [download] $exec_res returned a value of 0 and the replacement had failed. How can I overcome this failure?	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: Replacing none alpha/numeric characters in a Word document
by merrymonk (Hermit) on Dec 29, 2008 at 13:24 UTC

$search-> {Text} = $oldtext;
$replace-> {Text} = $newtext;
$search-> Execute({Replace => wdReplaceAll});
[download]

$search-> {Text} = "\x{0021}";
$replace-> {Text} ="\x{0022}";
$exec_res = $search-> Execute({Replace => wdReplaceAll});
[download]

[reply]
[d/l]
[select]