When I ran your script on macosx, in a "Terminal" window with character encoding set to utf8, it displayed some of the lines with the expected single-column accented character (e.g. á ã à and so on), but for others, it displayed a digraph -- the unaccented character followed by the diactric in the second column.

This is what I would expect, given that only some combinations of letters and diacritics are actually used in various human languages, and it's only the ones that are used that get a "unified glyph" in standard fonts.

If I had a different process for displaying text -- particularly, one that treated all those letter-plus-accent sequences the same way (e.g. print the letter, backspace, then print the accent without erasing the letter, or detect the letter+accent sequence and print them both before advancing the cursor to the next column), then everything would be the way you want it. Instead, my process only knows how to "coalesce" a letter+accent sequence when it happens to match an accented character that exists in the font. (I guess whatever you're using to display the text, it doesn't know how to do even that much.)

Bear in mind that while the unicode standard does set a "canonical ordering" for letters+accents when these are expressed as character sequences, it also says that pre-combined forms should be used in preference to sequences as a rule. (Of course, rules are made to be broken, but this is an area where breaking the rules might not be worth it.)


In reply to Re: Problem with unicode combination diacritics by graff
in thread Problem with unicode combination diacritics by muba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.