in reply to Re^2: Replacing none alpha/numeric characters in a Word document
in thread Replacing none alpha/numeric characters in a Word document
Your lines ranging from "001 A001" to "256 A256" don't make any sense to me. What was the point of that? Note that "\x{01}" is an ASCII control character, likewise up through "\x{19}", and for "\x{7f}") -- these will display nothing (or could make your display do weird things). Also, please be careful that you don't confuse "backslash" \ with "forward slash" / -- these are very different things.
If using the "\x{....}" works in terms of allowing the perl script to insert any Unicode character you want into the doc file, then what more is there to worry about? That's the way to go. You just need to be able to find the hex-numeric unicode code-point value for the characters you want to insert. (That's why I pointed to that "handy tool", to provide a way to search the unicode character table.)
For example, as you should have figured out by now, "\x{200C}" is the "left single quotation mark" and "\x{200D}" is the "right single quotation mark", regardless whether you are using a Times font, or Courier, or Arial, or Helvetica, or ...
As for going "via the hex dump route", if you have "unix tools for windows", you can check out either "od" or "xxd" (though I am sure there are hexdump tools that are "native" to windows, as well). Or you can whip up something pretty easily in perl -- here's a basic/simple hexdump tool:
(But really, learning to use tools like "od" or "xxd" is better.)#!/usr/bin/perl die "Usage: $0 file_name\n" unless (@ARGV==1 and -f $ARGV[0]); open( I, "<", $ARGV[0] ); binmode I; $offset = 0; while ( $n = read( I, $b, 16 )) { ( $c = $b ) =~ s/[^[:print:]]/./g; printf( "%08x: %-47s %s\n", $offset, join( " ", map{ sprintf( "%02x", $_ )} unpack( "C*", $b )) +, $c ); $offset += $n; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Replacing none alpha/numeric characters in a Word document
by merrymonk (Hermit) on Dec 29, 2008 at 13:24 UTC |