in reply to Re: DBD::Oracle uses Perl's internal representation of strings
in thread DBD::Oracle uses Perl's internal representation of strings

Thanks for the response, but I think the problem is a little more subtle. The reason I think that the above script produces different results is that DBD::Oracle is ignoring Perl's internal utf8 flag that is associated with each Perl string.

For instance, in the case of $q1, Perl is using a single octet for the character 'ä'. In the second case Perl is using two octets for that character, and the 'utf8' flag is on. If DBD::Oracle was looking at the 'utf8' flag, it really should produce the same results in both cases. Since it is not, I can only conclude that it is ignoring the utf8 flag.

Consequently, I am not sure that any setting of NLS_LANG will fix the problem. For instance, if NLS_LANG is set to 'utf-8', then DBD::Oracle will interpret $q1 incorrectly. On the other hand, if NLS_LANG is set to 'iso-8859-1', DBD::Oracle will interpret $q2 incorrectly.

I could be wrong about this. Using Encode::encode all the time is okay with me -- I just want to make sure that it is necessary.

  • Comment on Re^2: DBD::Oracle uses Perl's internal representation of strings

Replies are listed 'Best First'.
Re^3: DBD::Oracle uses Perl's internal representation of strings
by mje (Curate) on Nov 07, 2007 at 17:46 UTC

    What I meant by look at your NLS settings is that OCI needs to know the encoding of the strings passed to it. If you say the strings are iso8859 they will be interpreted as is8859 and if you say they are utf8 they will be interpreted as utf8 - you have to have one or the other.

    By adding char(1024) you turned the string into unicode and so DBD::Oracle will pass utf8 strings as DBD::Oracle DOES know the string you passed it is unicode but then the NLS setting comes in to play since OCI will use that to decide the encoding of the client character-set.

    If you want to use iso8859 then make sure you NLS setting is correct and don't use unicode strings. If you want to use unicode strings then set your NLS setting to utf8 and change the iso8859 strings to unicode. Don't try and mix them without performing the conversion.