in reply to Re^3: Help encode_entities doesn't seem to work
in thread [SOLVED] -Help encode_entities doesn't seem to work

Hello superdoc,

Thanks for your reply.
I have update the code to use .xlsx file with

# STEP1: The data from XLS file is stored in temp TXT file my $parser = Spreadsheet::ParseXLSX->new();

I haven't change anything
In my text file, I have also updated the code like
$mac = encode("utf-8", $cell_unformatted); print_txt "$row;;$col;;", $mac ,"\n";
I'm sticked about the decode part. I don't see how to solve it.
On my XML like I said, I have rearrange a script provide for another object. I would like to skip my first line, but I don't underwent how to do it.
For my text, I need to encode accent like é on & eacute I have found how.
Thanks for all
Balawoo

Replies are listed 'Best First'.
Re^5: Help encode_entities doesn't seem to work
by haj (Vicar) on Feb 10, 2019 at 19:18 UTC

    Hello Balawoo,

    I admit that I'm having some difficulties relating your attempts to my recommendations.

    If you change the format to XSLX files, then there'll be no more MacRoman encoding: All strings in XLSX files are formatted in UTF-8. Furthermore, you don't need to decode anything, because Spreadsheet::ParseXSLX will do that for you. So, you've found another way to get rid of that problem.

    Your method to create the text file in UTF-8 (encoding the individual cells and then write with Perl's default encoding) sort of works, but I would really recommend that you open the file for UTF-8 encoding like this:

    open (TXT, ">:encoding(UTF-8)", $txt) || die("Could not open file! $txt");

    Of course, you need to read this file as UTF-8 as well:

    open (SOURCE, "<:encoding(UTF-8)", $txt) || die ("Could not open file! $txt");

    You still haven't convinced me that you need to encode accents like é to &eacute. If you write &eacute to a XML file, you get an invalid XML file. If you want to have the string &eacute as literal content of the XML element, then you need to encode twice: Once to convert é to &eacute, and a second time (use encode_entities without a second parameter for this) to convert the & character to &amp;. In the XML file you'll then see &amp;eacute, but an XML processor will read it as &eacute. Note that you still need to get the use utf8; thing right if you want to pass your string literal as a second parameter to encode_entities.