comment on

"Times" is a font, not a character encoding. The font says what the characters look like when displayed or printed; the encoding says which bit patterns (numeric values) are mapped to which characters.

Your lines ranging from "001 A001" to "256 A256" don't make any sense to me. What was the point of that? Note that "\x{01}" is an ASCII control character, likewise up through "\x{19}", and for "\x{7f}") -- these will display nothing (or could make your display do weird things). Also, please be careful that you don't confuse "backslash" \ with "forward slash" / -- these are very different things.

If using the "\x{....}" works in terms of allowing the perl script to insert any Unicode character you want into the doc file, then what more is there to worry about? That's the way to go. You just need to be able to find the hex-numeric unicode code-point value for the characters you want to insert. (That's why I pointed to that "handy tool", to provide a way to search the unicode character table.)

For example, as you should have figured out by now, "\x{200C}" is the "left single quotation mark" and "\x{200D}" is the "right single quotation mark", regardless whether you are using a Times font, or Courier, or Arial, or Helvetica, or ...

As for going "via the hex dump route", if you have "unix tools for windows", you can check out either "od" or "xxd" (though I am sure there are hexdump tools that are "native" to windows, as well). Or you can whip up something pretty easily in perl -- here's a basic/simple hexdump tool:

#!/usr/bin/perl

die "Usage:  $0  file_name\n" unless (@ARGV==1 and -f $ARGV[0]);

open( I, "<", $ARGV[0] );
binmode I;
$offset = 0;
while ( $n = read( I, $b, 16 )) {
    ( $c = $b ) =~ s/[^[:print:]]/./g;
    printf( "%08x: %-47s  %s\n", $offset,
            join( " ", map{ sprintf( "%02x", $_ )} unpack( "C*", $b ))
+,
            $c );
    $offset += $n;
}
[download]

(But really, learning to use tools like "od" or "xxd" is better.)

In reply to Re^3: Replacing none alpha/numeric characters in a Word document by graff
in thread Replacing none alpha/numeric characters in a Word document by merrymonk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.