in reply to Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel

When dealing with encodings, you have to control (and check) the whole path your data takes. Most likely, you are not passing the Unicode characters to your database, or your database does not understand the Unicode characters. When retrieving the data from the database, you might retrieve octets that are UTF-8 encoded strings, but you don't decode these octets into proper Unicode characters. Also, when debugging by printing your data to the screen, your terminal might be configured for a different encoding than UTF-8, or you might output your data in a different encoding than UTF-8.

  • Comment on Re: Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel

Replies are listed 'Best First'.
Re^2: Handling variety of languages/Unicode characters with Spreadsheet::ParseExcel
by richb (Scribe) on Apr 09, 2010 at 01:50 UTC

    Ah, sorry, I wasn't clear.

    The issue isn't at the database. When I edit the .SQL file my Perl script produces, I see the mangled data there.

    I didn't even get to the point of running the SQL script to load the data, after seeing the Unicode characters not making it into the .SQL file properly.

      Then, are you sure that your .SQL file is UTF-8, and that the text editor you're using to view it understands UTF-8? Most likely you'll want to tell Perl to encode output to UTF-8:

      open my $sql_file, '>:encoding(UTF-8)', $sql_name or die "Couldn't create '$sql_name': $!";

        Thanks for your response! Yes to both Qs. The script prints the SQL to STDOUT and I redirect output to a file. The script sets the output encoding to UTF-8 with

        binmode STDOUT, ':utf8';