Hello Balawoo,

I admit that I'm having some difficulties relating your attempts to my recommendations.

If you change the format to XSLX files, then there'll be no more MacRoman encoding: All strings in XLSX files are formatted in UTF-8. Furthermore, you don't need to decode anything, because Spreadsheet::ParseXSLX will do that for you. So, you've found another way to get rid of that problem.

Your method to create the text file in UTF-8 (encoding the individual cells and then write with Perl's default encoding) sort of works, but I would really recommend that you open the file for UTF-8 encoding like this:

open (TXT, ">:encoding(UTF-8)", $txt) || die("Could not open file! $txt");

Of course, you need to read this file as UTF-8 as well:

open (SOURCE, "<:encoding(UTF-8)", $txt) || die ("Could not open file! $txt");

You still haven't convinced me that you need to encode accents like é to &eacute. If you write &eacute to a XML file, you get an invalid XML file. If you want to have the string &eacute as literal content of the XML element, then you need to encode twice: Once to convert é to &eacute, and a second time (use encode_entities without a second parameter for this) to convert the & character to &amp;. In the XML file you'll then see &amp;eacute, but an XML processor will read it as &eacute. Note that you still need to get the use utf8; thing right if you want to pass your string literal as a second parameter to encode_entities.


In reply to Re^5: Help encode_entities doesn't seem to work by haj
in thread [SOLVED] -Help encode_entities doesn't seem to work by Balawoo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.