Thank you for your thorough explanation, tye. You answered my question.

The W3C is doing the right thing. (See 8.2.2.2 Character encodings in the HTML5 working draft specification.) Its willful violation of anachronistic standards for compelling, practical reasons is, IMHO, a practice that is overdue in Perl 5. By now, Perl 5 should also be defaulting to Windows-1252 instead of to ISO 8859-1 (Latin 1). Its failure to do this is one of the little things that make Perl 5 seem old and crufty, especially to Windows programmers. By dogmatically adhering to some misguided commitment to compatibility and portability, Perl 5 violates the principle of least astonishment.

By the way, I had done something like this…

C:\>chcp Active code page: 1252 C:\>type match_test_3.pl #!perl use strict; use warnings; use open qw( :encoding(Windows-1252) :std ); my $pattern = qr/\A\w+\z/; for my $word (@ARGV) { my $result = $word =~ $pattern ? "matches" : "doesn't match"; printf qq/The word "%s" %s the pattern %s\n/, $word, $result, $pat +tern; } C:\>perl match_test_3.pl Tšekissä Žena Œdipus Rex "\x{009a}" does not map to cp1252 at match_test_3.pl line 12. The word "T\x{009a}ekissä" doesn't match the pattern (?^:\A\w+\z) "\x{008e}" does not map to cp1252 at match_test_3.pl line 12. The word "\x{008e}ena" doesn't match the pattern (?^:\A\w+\z) "\x{008c}" does not map to cp1252 at match_test_3.pl line 12. The word "\x{008c}dipus" doesn't match the pattern (?^:\A\w+\z) The word "Rex" matches the pattern (?^:\A\w+\z) C:\>

…before I posted my inquiry here to prove to myself that the problem wasn't just with the use within the Perl source file of Windows-1252 characters in the range from 80 thru 9F.

There's a Feedback button at the bottom of the page http://www.fileformat.info/info/unicode/char/009a/index.htm. ☺

Thanks again.

Jim


In reply to Re^2: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding) by Jim
in thread Windows-1252 characters from \x{0080} thru \x{009f} by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.