Hi all,

I am getting files from various sources in different European languages. The requirement in these files is that everything should be in ASCII compatible.

So for special characters in Danish, Finnish, the UTF-8 codes here ( UTF codes ) are typed in directly.So a line of text could contain

"This line contains 0xC30x86n exotic character."
And this should be printed into HTML and PDF with the right fused AE character :
"This line contains AEn exotic character."

Changing the format of the files is not an option as it is easy for everyone to type the UTF codes for a character they do not even know.

My question is this: How should I read these text files, evaluate these special characters on the fly.

Some pointers would be much appreciated, Many thanks Chandra

In reply to Evaluating UTF codes in a file by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.