in the absence of "use utf8", the perl should believe that the file is encoded using character set defined in locale.

I don't think this would be a good idea.  The encoding of source files is something that's tied to the files themselves, not the environment they're run in.  In other words, when moving Perl code to a different locale, you'd risk breaking things (unnecessarily)...

Other than that, I agree with the tenor of your post and also do think it would be nice to have locales work in combination with Unicode.  After all, locales comprise more than just the definition of valid characters.

However, I'm not proficient enough with locales (nor with the Perl sources) to help out with patches — so I'm not complaining...  (Heck, I'm not even sure how things are supposed to work in some aspects.  Let's say, with a locale setting of LC_CTYPE=de_DE.UTF-8, should all characters defined in Unicode match \w, or just the ones actually being used in the respective language/region?  For example, both 'ä' and '䕧' (U+4567) are valid letters according to Unicode, but the latter is not a valid letter in German, so one might argue it shouldn't match \w when the de_DE locale is in effect.)


In reply to Re: Locale and Unicode, enemies in perl? by Eliya
in thread Locale and Unicode, enemies in perl? by andal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.