in reply to Mixed character encoding issues

I don't have excel to test your script with, and I'm just beginning to get a handle on unicode, but a similar problem was recently resolved at Re^3: Perl TK character disappearing. You can't decode your cp1252 input as utf-8. You can decode it as cp1252 then save it as utf8.

If I was able to run the code, this would be my first thing to try. I like the utf8::all module.

# this will make all filehandles and STDOUT prints be utf8 use utf8::all; # a very simple module # get your excel rows and decode them # Gives error "Cannot decode string with wide characters" # map { $_ = decode("utf8", $_) } @row; # decode the cp1252 stuff properly map { $_ = decode("cp1252", $_) } @row; # prints to the csv file will be utf8

Windows-1252 characters from \x{0080} thru \x{009f} might also be useful to you.


I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh

Replies are listed 'Best First'.
Re^2: Mixed character encoding issues
by ddaupert (Initiate) on Jul 06, 2012 at 16:48 UTC

    I appreciate the help, zentara. I did install utf8::all and used it in the packages where I open/close files. Alas, it made no difference as far as output. I got the same character mangling as before. But I am sure the utf8::all module will come in handy. Thanks for mentioning it.

    Regarding the decode suggestion, I based my usage on the Encode documention:

    ...to convert ISO-8859-1 data into a string in Perl's internal format:

    $string = decode("iso-8859-1", $octets);

    This was just a 'Hail Mary' attempt; I am not at all convinced my input string is in octets, and when I precede this function with its mate that does a translation into octets, the output gets even more mangled.

    When I get off work today, I will look into the other links you mentioned.

    /dennis