Locutus has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks,

this question might be one for the DBI:Oracle gurus amongst you:

I'd like to retrieve UTF-8 encoded data from an Oracle 10g database (Enterprise Edition Release 10.2.0.3.0 - 64bit Production, with parameters NLS_LANGUAGE=AMERICAN, NLS_TERRITORY=AMERICA, NLS_CHARACTERSET=UTF8) running in a Solaris 10 environment. Since my locale's default character set is ISO 8859-1 aka latin1 I have to prevent Oracle from re-coding output accordingly. A colleague told me that setting the environment variable NLS_LANG from inside my Perl script by

$ENV{'NLS_LANG'} = 'AMERICAN_AMERICA.UTF8';
should do the trick - and it does under Perl v5.6.1 with DBI v1.40.

However, under Perl v5.8.6 with DBI v1.601 this assignment seems to have almost no effect at all: Instead of the UTF-8 byte combo \xC3\x84 (representing U+00C4 aka LATIN CAPITAL LETTER A WITH DIAERESIS), for example, I get only the single byte \xC4 (latin1 code for U+00C4!) or - if I omit the line in question - the single byte \x41 (latin1 code for a simple A).

This experience made me suspect that modifying $ENV{'NLS_LANG'} was deprecated at some point in Perl/DBI history. Is that true?

If yes, can you please point me to some document describing the currently recommended way to make the Oracle server believe that no character re-coding is necessary?

Humble greetings
Locutus

Replies are listed 'Best First'.
Re: Is setting $ENV{'NLS_LANG'} deprecated?
by derby (Abbot) on Dec 09, 2008 at 12:02 UTC
Re: Is setting $ENV{'NLS_LANG'} deprecated?
by mje (Curate) on Dec 09, 2008 at 13:38 UTC

    I use DBI (1.601) and DBD::Oracle all the time and all my data is utf8. I have set NLS_LANG=AMERICAN_AMERICA.AL32UTF8 in my environment and it works fine. I believe NLS_LANG is still in use and still supposed to work - it is a feature picked up by the Oracle client libraries.

    Did you set NLS_LANG before connecting? Try setting it your environment and exporting it before running your script. As you don't mention how you came by the latin captital letter a with diaeresis it is difficult to say any more.

    IIRC some versions of oracle don't have support for utf8 - some versions or oracle XE I believe.

Re: Is setting $ENV{'NLS_LANG'} deprecated?
by cmdrake (Acolyte) on Dec 09, 2008 at 15:20 UTC
    Locutus,
    According to the docs, you'll need to use AL32UTF8 instead of UTF8. See: Oracle_UTF8_is_not_UTF-8 and note the following:
    Fetching Data
    
    Any data returned from Oracle to DBD::Oracle in the AL32UTF8 character set will be marked as UTF-8 to ensure correct handling by Perl.
    
    For Oracle to return data in the AL32UTF8 character set the NLS_LANG or NLS_NCHAR environment variable must be set as described in the previous section.
    
    When fetching NCHAR, NVARCHAR, or NCLOB data from Oracle, DBD::Oracle will set the Perl UTF-8 flag on the returned data if either NLS_NCHAR is AL32UTF8, or NLS_NCHAR is not set and NLS_LANG is AL32UTF8.
    
    When fetching other character data from Oracle, DBD::Oracle will set the Perl UTF-8 flag on the returned data if NLS_LANG is AL32UTF8.
    
Re: Is setting $ENV{'NLS_LANG'} deprecated?
by Tux (Canon) on Dec 09, 2008 at 15:07 UTC

    Be sure to set it bbefore/b DBD::Oracle connects, like mje suggests

    BEGIN { $ENV{NLS_LANG} = "AMERICAN_AMERICA.UTF8" }

    Be sure to have your data correctly encoded before passing it to the DBI. See what happens with the different data:

    $ perl -C2 -MDP -wle'$_="\xc3\x84";print"$_\t",DPeek$_' Ã PV("\303\204"\0) $ perl -C2 -MDP -wle'$_="\xc3\x84";utf8::decode$_;print"$_\t",DPeek$_' Ä PV("\303\204"\0) [UTF8 "\x{c4}"] $ perl -C2 -MDP -wle'$_="\x{c4}";print"$_\t",DPeek$_' Ä PV("\304"\0) $ perl -C2 -MDP -wle'$_="\x{c4}";utf8::decode$_;print"$_\t",DPeek$_' Ä PV("\304"\0) $ perl -C2 -MDP -wle'$_="\x{c4}";utf8::encode$_;print"$_\t",DPeek$_' Ã PV("\303\204"\0) $ perl -C2 -MDP -wle'$_="\x{c4}";utf8::encode$_;utf8::decode$_;print"$ +_\t",DPeek$_' Ä PV("\303\204"\0) [UTF8 "\x{c4}"]

    Enjoy, Have FUN! H.Merijn
Re: Is setting $ENV{'NLS_LANG'} deprecated?
by Locutus (Beadle) on Dec 15, 2008 at 11:26 UTC
    Thanks for sharing your wisdom, Monks!

    It's good to know that I'm not trying to do something deprecated - but, unfortunately, so far none of your replies has helped to disclose the secret behind the difference between Perl 5.6.1/DBI 1.40 and Perl 5.8.6/DBI 1.601 in this case:

    @derby: I suppose by "OCI" you meant "Oracle::OCI". Interestingly, for both(!) of the mentioned Perl versions

    $> perl -MDBD::Oracle -e 'print "$DBD::Oracle::VERSION\n"'
    returns
    Can't load '/.../<PERL_VERSION>/sun4-solaris/auto/DBD/Oracle/Oracle +.so' for module DBD::Oracle: ld.so.1: perl: fatal: libclntsh.so.9.0: +open failed: No such file or directory at /.../<PERL_VERSION>/sun4-so +laris/DynaLoader.pm line <LINE_NUMBER>.
    and
    $> perl -MOracle::OCI -e 'print "$Oracle::OCI::VERSION\n"'
    returns
    Can't locate Oracle/OCI.pm in @INC (@INC contains: ...)
    so it seems neither of these modules is used (at least by the working duo Perl 5.6.1 and DBI 1.40) anyway. See below for a complete code example if you like.

    @mje, Tux, and cmdrake: Yes, I do set NLS_LANG before connecting to the database and I tried both setting it in the environment and setting it within a BEGIN block at the beginning of my Perl script. Even replacing UTF8 by AL32UTF8 didn't help with Perl v5.8.6/DBI v1.601 (while it didn't hurt with Perl v5.6.1/DBI v1.40).

    Of course, I'm happy to share my code with you in order to provide more detailed information. I'm using a csh wrapper script

    to set some environment variables (including NLS_LANG) and to pass one single argument (the SQL statement to be executed) to the Perl script doing the DBI stuff: The database field which contains the data I'm actually interested in is declared to be of type LONG. (Please note, that I won't be allowed to modify any part of the database setup without losing support by the software producer.)

    Do you spot anything suspicious?

    Best regards
    Locutus

      Locutus, I suspect perl -MDBD::Oracle -e 'print "$DBD::Oracle::VERSION\n"' failed because your environment wasn't set. Can you try:
      #!/bin/csh # NOTE: '...' is a placeholder for one or more levels of sub setenv ORACLE_HOME "/.../oracle/product/920" setenv LD_LIBRARY_PATH $ORACLE_HOME/lib32:/usr/lib:/usr/ucb/lib setenv ORA_NLS33 $ORACLE_HOME/ocommon/nls/admin/data setenv NLS_LANG AMERICAN_AMERICA.AL32UTF8 setenv LD_PRELOAD /usr/lib/libthread.so.1 perl -MDBD::Oracle -e 'print "$DBD::Oracle::VERSION\n"'
      and give us the versions of DBD::Oracle in both environments?
        Hi cmdrake,

        first of all I'd like to apologize for letting you wait that long for a response - Christmas holidays and several new projects starting with the begin of 2009 drew off my attention and made me forget about the really interesting things, sorry!

        Indeed, printing the version of DBD::Oracle failed due to the missing environment. Thanks for the hint. And here are, finally, the results of your csh script:

        • Perl 5.6.1 uses version 1.15 of DBD::Oracle while
        • Perl 5.8.6 uses version 1.20.
        A little but significant difference?

        Best regards
        Locutus