I don't see in ikegami's script the need for use utf8;.

The OP as well as ikegami's script contain the string 'Fräsen und ndk (Kamera - Fräsaufnahme)'. From utf8: "The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. ... Do not use this pragma for anything else than telling Perl that your script is written in UTF-8. ... Because it is not possible to reliably tell UTF-8 from native 8 bit encodings, you need either a Byte Order Mark at the beginning of your source code, or use utf8;, to instruct perl."

Although the "ä" may happen appear to work because it's part of the Latin1 character set, which Perl typically uses internally, it will most likely not do what you want on any Unicode characters outside of that set. As you can see below, the only version of the code in which the UTF8 is flag properly set on the string is the one where the source is encoded as UTF-8 and use utf8; is used. The rule of thumb I always use is to either work entirely in ASCII (using escapes such as \N{} to specify Unicode characters), or otherwise use a UTF-8 encoding on the source code and use utf8;. See also perluniintro and perlunicode.

$ cat with_utf8.pl use warnings; use strict; use utf8; use Devel::Peek; my $string = 'Fräsen und ndk (Kamera - Fräsaufnahme)'; Dump($string); $ perl -pe 's/^(?=.*utf8)/#/' with_utf8.pl | tee without_utf8.pl use warnings; use strict; #use utf8; use Devel::Peek; my $string = 'Fräsen und ndk (Kamera - Fräsaufnahme)'; Dump($string); $ iconv -f UTF-8 -t Latin1 without_utf8.pl -o latin1.pl $ file -i *.pl latin1.pl: text/plain; charset=iso-8859-1 without_utf8.pl: text/plain; charset=utf-8 with_utf8.pl: text/plain; charset=utf-8 $ perl latin1.pl SV = PV(0x1365d70) at 0x13855c0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x13d7160 "Fr\344sen und ndk (Kamera - Fr\344saufnahme)"\0 CUR = 38 LEN = 40 COW_REFCNT = 1 $ perl without_utf8.pl SV = PV(0xa15d70) at 0xa355c0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0xa87190 "Fr\303\244sen und ndk (Kamera - Fr\303\244saufnahme)" +\0 CUR = 40 LEN = 42 COW_REFCNT = 1 $ perl with_utf8.pl SV = PV(0x18d5d70) at 0x18f55d8 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x19384a0 "Fr\303\244sen und ndk (Kamera - Fr\303\244saufnahme) +"\0 [UTF8 "Fr\x{e4}sen und ndk (Kamera - Fr\x{e4}saufnahme)"] CUR = 40 LEN = 42 COW_REFCNT = 1

Updated as per ikegami's reply.


In reply to Re^3: german Alphabet by haukex
in thread german Alphabet by shreedara75

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.