How are you READING the UTF-8 data? Outputting is hard to do wrong. Indeed you just set an :encoding or :utf8 layer on the output handle.

However, if you use :utf8 for input, you're in for trouble (malfunction and security bugs). Always use :encoding for text input.

The error message about 0xF8 (which is the Danish ø character, not æ, which is indeed 0xE6) suggests to me that the input is NOT UTF-8, but instead ISO-8859-1 or ISO-8859-15, and the :utf8 was used. Update: I meant :encoding(utf8) here. ":utf8" should of course not be used for input.

If the input is ISO-8859, and the input layer is :utf8, you get lots of errors and you should be happy if any part of your program works correctly. Probably not the case here.

If the input is ISO-8859, and the input layer is :encoding(utf8), you get substitution characters for practically all non-ASCII characters.

The only correct way to read a ISO-8859-15 text file or stream, is to use :encoding(ISO-8859-15). This can be done automatically based on the locale, with "use open", see its documentation. Note that using that is likely to introduce problems for other users, especially those who don't have any locale, but do have a UTF-8 capable terminal. This, however, is not a Perl problem.

If you haven't already done so, please forget everything you've ever read and learned about Perl unicode support, and read perlunitut.

Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }


In reply to Re: i18n/utf8 problem, 'utf8 "\xF8" does not map to Unicode' by Juerd
in thread i18n/utf8 problem, 'utf8 "\xF8" does not map to Unicode' by bcrowell2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.