Here's a simple tip: if you want to be dealing with text instead of encoding issues etc (and you almost always want to) use perl's IO layers to deal with input and output.

Note that my $iso_8859_1 = 'Österreich'; is usually only guaranteed to be iso-8859_1 encoded if you know that the source file is iso_8859_1 (instead of utf-8) encoded and/or you've not switched on "use utf8" somewhere. That can cause all kinds of interesting issues.

Also note that this is exactly the kind of thing you do NOT want to have to deal with. I'm tempted to say; just make a habit of use()ing utf8 and switch all your scripts to utf-8 encoding, or only use 7-bit ASCII in source files.

The only sane way to deal with unicode IO is to keep everything correctly flagged as being either in the "internal multibyte encoding" or binary/8-bit, use IO layers for input/output, use Encode::decode() to interpret binary strings directly if you have to, and never, ever use Encode::encode():

my $string = Encode::decode("iso-8859-1","\x{d6}sterreich"); # we want to write a utf-8 file open my $fh,">:utf8","/some/path" or die $!; print $fh $string; close $fh or die $!;

In reply to Re: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode? by Joost
in thread Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode? by telcontar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.