in reply to DBD::Oracle - Character Encoding Conversion

kennethk's advice worked, although I needed to set the ora_charset attribute to 'WE8ISO8859P15' since that is the default charset of the Oracle DB.

I definitely understand that the characters you see are not necessarily the characters stored. The reliable way of seeing what was in the Oracle DB was to use the RAWTOHEX SQL function. It's a bit hard to read (no spaces) using SQL*Plus, but I suppose you could read it into your Perl script and print it out better.

On the Perl side, Data::Dumper shows an escaped character if the charset isn't set: e.g. "resum\x{e9}", where e9 corresponds to "e with acute accent mark". With ora_charset specified, it correctly displays the UTF-8 text.

And then there is whether your console displays the characters correctly. My main console program doesn't, but I have another one that does.

Thanks for the help!

  • Comment on Re: DBD::Oracle - Character Encoding Conversion

Replies are listed 'Best First'.
Re^2: DBD::Oracle - Character Encoding Conversion
by Anonymous Monk on Sep 21, 2015 at 18:40 UTC
    No, you probably could not "print it out better," because it would contain character escapes ... and your terminal, if nothing else, would try its best to print it "properly." I have been known to copy hex sequences to two different text files and then use 'diff' to compare them. As you say, it is very maddening, but you simply cannot 'eyeball' this stuff. There are too many unrelated-things between the data and your eyeballs!
Re^2: DBD::Oracle - Character Encoding Conversion
by kennethk (Abbot) on Sep 21, 2015 at 19:46 UTC
    I'm glad things are working, but as you have both insert and select operations happening, make sure that you can insert a new string, verify its content in the DB, and extract it again correctly. With a mixed mode like that, I would be concerned that what's working now is some data that was previously corrupted.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re^2: DBD::Oracle - Character Encoding Conversion
by Anonymous Monk on Sep 22, 2015 at 15:14 UTC

    OK I think I jumped the gun on declaring it fixed. The issue is actually addressed in a CPAN bug here.

    What was happening when I set the "ora_charset" attribute to "WE8ISO8859P15" in connect() was that it stores UTF8 strings into the column of the table. However, those strings should really be ISO-8859-15 because that is the default character set of the DB. (And the column is CHAR not NCHAR.)

    So it looks like I'll need to convert the strings returned from the DB to UTF-8 manually. Not a big problem. If anyone has another idea, let me know.

      As a matter of etiquette, it's considered appropriate to reply to those who reply to you. In addition to maintaining a thread of conversation, PerlMonks can be configured to send a message when someone replies to one of your nodes, so it makes it more likely that the original respondent can continue assisting.

      Can you define manually? I've fixed corrupted files in the past using carefully crafted regular expressions, but that is usually not the right choice. Better would be using Encode to do transformations once you know how your input stream is formatted. Encode::Guess might be helpful there, though it has limited utility because it's such a messy problem. As some background, you should read through perluniintro if you haven't already.

      TL;DR- What you are proposing is possible, but sounds a lot like nested band-aids.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Pardon the breach of etiquette, I am new to these forums.

        By manually I meant explicitly. Here is the somewhat counter-intuitive code, where "$value" is from the hashref returned by a fetchall_hashref call:

        Encode::_utf8_off($value); $value = Encode::decode("utf8", $value);

        $value ends up being a Perl string with the utf8 flag on.

        Thanks!