DBD::Oracle uses Perl's internal representation of strings

pc88mxer has asked for the wisdom of the Perl Monks concerning the following question:

Through some experimenting, I've determined that when you pass a string to DBD::Oracle, it uses Perl's internal representation of the string. For example:

use DBI;
my $dbi = DBI->connect(...);

sub select_scalar {
  my ($dbi, $sql) = @_;
  my $sth = $dbi->prepare($sql);
  $sth->execute();
  my $r = $sth->fetch_row_array();
  return $r->[0];
}

my $q1 = "select 1 from dual where 'ä' = chr(228)";
my $q2 = $q1.chr(1024);
chop($q2);

print "q1 eq q2? ", ($q1 eq $q2 ? "yes" : "no"), "\n";

my $r1 = select_scalar($dbi, $q1);
print "r1 = $r1\n";
my $r2 = select_scalar($dbi, $q2);
print "r2 = $r2\n";
[download]

This will emit:

q1 eq q2? yes
r1 = 1
r2 =
[download]

So, even though q1 and q2 are the same Perl string (character by character), they are different when passed to DBD::Oracle. A fix would be to use Encode::encode('iso-8859-1', ...) on every string passed to DBD::Oracle (query strings as well as values for ? place holders.)

My question: is there another (read 'better') way to do this? I'm using perl 5.8.0 and OCI version 8.

Comment on DBD::Oracle uses Perl's internal representation of strings Select or Download Code

Replies are listed 'Best First'.
Re: DBD::Oracle uses Perl's internal representation of strings by mje (Curate) on Nov 07, 2007 at 15:02 UTC
I think when you added chr(1024) to $q2 it got upgraded to a unicode string and chopping it off the end does not downgrade it. I think you need to look very carefully at how you have set your NLS settings both in the database and the client (e.g. what is NLS_LANG set to) to ensure they match and then not try and insert unicode chrs into a database using iso8859 or vice versa. If you want to insert unicode chrs into an oracle database with DBD::Oracle then you'll need to set your NLS_LANG to something like AMERICAN_AMERICA.AL32UTF8. I believe this is all documented in the DBD:Oracle pod.	[reply]
Re^2: DBD::Oracle uses Perl's internal representation of strings by pc88mxer (Vicar) on Nov 07, 2007 at 17:19 UTC
Thanks for the response, but I think the problem is a little more subtle. The reason I think that the above script produces different results is that DBD::Oracle is ignoring Perl's internal utf8 flag that is associated with each Perl string. For instance, in the case of $q1, Perl is using a single octet for the character 'ä'. In the second case Perl is using two octets for that character, and the 'utf8' flag is on. If DBD::Oracle was looking at the 'utf8' flag, it really should produce the same results in both cases. Since it is not, I can only conclude that it is ignoring the utf8 flag. Consequently, I am not sure that any setting of NLS_LANG will fix the problem. For instance, if NLS_LANG is set to 'utf-8', then DBD::Oracle will interpret $q1 incorrectly. On the other hand, if NLS_LANG is set to 'iso-8859-1', DBD::Oracle will interpret $q2 incorrectly. I could be wrong about this. Using Encode::encode all the time is okay with me -- I just want to make sure that it is necessary.	[reply]
Re^3: DBD::Oracle uses Perl's internal representation of strings by mje (Curate) on Nov 07, 2007 at 17:46 UTC
What I meant by look at your NLS settings is that OCI needs to know the encoding of the strings passed to it. If you say the strings are iso8859 they will be interpreted as is8859 and if you say they are utf8 they will be interpreted as utf8 - you have to have one or the other. By adding char(1024) you turned the string into unicode and so DBD::Oracle will pass utf8 strings as DBD::Oracle DOES know the string you passed it is unicode but then the NLS setting comes in to play since OCI will use that to decide the encoding of the client character-set. If you want to use iso8859 then make sure you NLS setting is correct and don't use unicode strings. If you want to use unicode strings then set your NLS setting to utf8 and change the iso8859 strings to unicode. Don't try and mix them without performing the conversion.	[reply]