Oh dear. I was sure that was going to work.

Are you sure that it's changing them to question marks? i.e. are you sure it's not just that your terminal can't display the unicode characters? Sorry for stating the obvious - but you need to view the output in a browser, or something else that can display unicode, or check the hex values to see if they're correct.

Alternatively, you could work around the problem by doing the input cleanup in a different way, but I think your original approach was correct and it should work if we can get perl to treat the strings as unicode.

The fact that it does something different does make me think that it might be working, but it's showing up another problem somewhere.

I might not be online much for a couple of days (I'm actually in an internet cafe in Vientiane, Laos...) but I would suggest drawing attention to this thread in the chatterbox at a busy time to get someone else to look at it. It seems to have gone quiet. Or perhaps it would be justified to start a new thread with your problem more narrowed down.

Good luck...


In reply to Re^7: keeping diacritical marks in a string by FalseVinylShrub
in thread keeping diacritical marks in a string by Foxpond Hollow

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.