in reply to DBD::Oracle uses Perl's internal representation of strings

I think when you added chr(1024) to $q2 it got upgraded to a unicode string and chopping it off the end does not downgrade it. I think you need to look very carefully at how you have set your NLS settings both in the database and the client (e.g. what is NLS_LANG set to) to ensure they match and then not try and insert unicode chrs into a database using iso8859 or vice versa. If you want to insert unicode chrs into an oracle database with DBD::Oracle then you'll need to set your NLS_LANG to something like AMERICAN_AMERICA.AL32UTF8. I believe this is all documented in the DBD:Oracle pod.
  • Comment on Re: DBD::Oracle uses Perl's internal representation of strings

Replies are listed 'Best First'.
Re^2: DBD::Oracle uses Perl's internal representation of strings
by pc88mxer (Vicar) on Nov 07, 2007 at 17:19 UTC
    Thanks for the response, but I think the problem is a little more subtle. The reason I think that the above script produces different results is that DBD::Oracle is ignoring Perl's internal utf8 flag that is associated with each Perl string.

    For instance, in the case of $q1, Perl is using a single octet for the character 'ä'. In the second case Perl is using two octets for that character, and the 'utf8' flag is on. If DBD::Oracle was looking at the 'utf8' flag, it really should produce the same results in both cases. Since it is not, I can only conclude that it is ignoring the utf8 flag.

    Consequently, I am not sure that any setting of NLS_LANG will fix the problem. For instance, if NLS_LANG is set to 'utf-8', then DBD::Oracle will interpret $q1 incorrectly. On the other hand, if NLS_LANG is set to 'iso-8859-1', DBD::Oracle will interpret $q2 incorrectly.

    I could be wrong about this. Using Encode::encode all the time is okay with me -- I just want to make sure that it is necessary.

      What I meant by look at your NLS settings is that OCI needs to know the encoding of the strings passed to it. If you say the strings are iso8859 they will be interpreted as is8859 and if you say they are utf8 they will be interpreted as utf8 - you have to have one or the other.

      By adding char(1024) you turned the string into unicode and so DBD::Oracle will pass utf8 strings as DBD::Oracle DOES know the string you passed it is unicode but then the NLS setting comes in to play since OCI will use that to decide the encoding of the client character-set.

      If you want to use iso8859 then make sure you NLS setting is correct and don't use unicode strings. If you want to use unicode strings then set your NLS setting to utf8 and change the iso8859 strings to unicode. Don't try and mix them without performing the conversion.