comment on

Thank you. That prints to STDOUT very nicely. So one problem resolved.
But it doesn't solve the original problem. Here's a worked example.
When I parse my two "authoritative" spreadsheets of names of palearctic birds, I would hope that both authorities would have the same common name for the species whose latin binomial is Phoenicurus erythrogastrus
But they dont.
One calls it Güldenstädt's Redstart
The other Güldenstädt’s Redstart
(That's the difference between 0027;APOSTROPHE; and 2019;RIGHT SINGLE QUOTATION MARK (fide C:\Perl\lib\unicore\UnicodeData.txt))
My current solution is to do a s/// on each $string from each OOorg spreadsheet, as follows

  $string =~ s/(\P{InBasic_Latin})/                 # Look for codepoi
+nts that are not in Basic_Latin; for example the sign Ã¼
    defined( $subs{ord($1)} )                       # if $1 = Ã¼,  the
+n ord($1)  = 252. We ask is there a value in %subs for key '251' ?
     ?  $subs{ord($1)}                              # If yes ( $subs{2
+51} = û ), then return û
     : ' <$subs{'                                   # if no, then retu
+rn<$hash{ ...
       . ord($1)                                    #  252  ...
       . "} = ${charinfo(ord($1))}{name};> "        #  } = LATIN SMALL
+ LETTER U WITH DIAERESIS;>
                                    /egx;           # /egx = e execute
+ g repeated x spaced out regex
                                                    # If a sigle was f
+ound that is absent from the hash, then the outfile will contain  "<$
+subs{8224} = DAGGER;>" etc
                                                    # You have to writ
+e into make_the_subs_hash()  a line like this $subs{8224} = '¦'; . Th
+ats  at [1] below
                                                    # Then re run the 
+script with the extended  %subs
  return($string);
[download]

Where the hash %subs is made as follows

  foreach my $i (126 ... 255) {
      $subs{$i} = chr($i);
  }
  # Plus higher value code points found empirically; see [1] above
  $subs{338}  = 'OE';# LATIN CAPITAL LIGATURE OE
  $subs{339}  = 'oe';# LATIN SMALL LIGATURE OE
  $subs{8217} = "'" ;# RIGHT SINGLE QUOTATION MARK
  $subs{8224} = '×' ;# DAGGER
[download]

Ugly, but at least everyone can see what is going on
Richard H

In reply to Re^6: One bird, two Unicode names by RCH
in thread One bird, two Unicode names by RCH

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.