Re: Mixed character encoding issues

I don't have excel to test your script with, and I'm just beginning to get a handle on unicode, but a similar problem was recently resolved at Re^3: Perl TK character disappearing. You can't decode your cp1252 input as utf-8. You can decode it as cp1252 then save it as utf8.

If I was able to run the code, this would be my first thing to try. I like the utf8::all module.


# this will make all filehandles and STDOUT prints be utf8
use utf8::all;   # a very simple module

# get your excel rows and decode them
 # Gives error "Cannot decode string with wide characters"
     # map { $_ = decode("utf8", $_) } @row;

# decode the cp1252 stuff properly
 map { $_ = decode("cp1252", $_) } @row;

# prints to the csv file will be utf8
[download]

Windows-1252 characters from \x{0080} thru \x{009f} might also be useful to you.

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh

Comment on Re: Mixed character encoding issues Download Code

Replies are listed 'Best First'.
Re^2: Mixed character encoding issues by ddaupert (Initiate) on Jul 06, 2012 at 16:48 UTC
I appreciate the help, zentara. I did install utf8::all and used it in the packages where I open/close files. Alas, it made no difference as far as output. I got the same character mangling as before. But I am sure the utf8::all module will come in handy. Thanks for mentioning it. Regarding the decode suggestion, I based my usage on the Encode documention: ...to convert ISO-8859-1 data into a string in Perl's internal format: `$string = decode("iso-8859-1", $octets);` [download] This was just a 'Hail Mary' attempt; I am not at all convinced my input string is in octets, and when I precede this function with its mate that does a translation into octets, the output gets even more mangled. When I get off work today, I will look into the other links you mentioned. /dennis	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Mixed character encoding issues
by ddaupert (Initiate) on Jul 06, 2012 at 16:48 UTC

I appreciate the help, zentara. I did install utf8::all and used it in the packages where I open/close files. Alas, it made no difference as far as output. I got the same character mangling as before. But I am sure the utf8::all module will come in handy. Thanks for mentioning it.

Regarding the decode suggestion, I based my usage on the Encode documention:

...to convert ISO-8859-1 data into a string in Perl's internal format:

$string = decode("iso-8859-1", $octets);
[download]

This was just a 'Hail Mary' attempt; I am not at all convinced my input string is in octets, and when I precede this function with its mate that does a translation into octets, the output gets even more mangled.

When I get off work today, I will look into the other links you mentioned.

/dennis

[reply]
[d/l]