jcbyrne has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to compare a string read from a file to a string retrieved from an Oracle 8i database. However, the string returned from the database seems to have NULL characters between each character, causing my comparison to fail every time. When I do a split on the string and print it out:
@test = split (//,$something); foreach (@test) {print "$_\n";}
I get a blank line between each character. If I do the following to the string before comparison:
$something =~ /\0//g;
The string acts as I would expect it to. The odd thing is that the NULLS do not appear if I run the SQL inside of a CGI script, just when it is run from the command line. Also, if I print the string to the console, the NULLS disappear, as if the print function knows how to deal with them. Has anyone ever seen anything like this?
Thanks in advance for any assistance that you can provide.

Replies are listed 'Best First'.
(tye)Re: DBI returns NULL characters in string
by tye (Sage) on Apr 24, 2001 at 18:40 UTC

    If every other character is \0, then you probably have a UTF16 string or (at least what Microsoft calls) a Unicode string.

    The only problem with just stripping the '\0's is that non-ASCII characters aren't going to be translated properly. Searching on "Unicode" or "wide characters" should find you more information to help you decide how you want to deal with these.

    Recent versions of Perl have some Unicode support (though it is still probably considered "experimental") so you could tell Perl that the string you got was Unicode and then most operations would work (for example, you could compare the string to a non-Unicode string and Perl would convert one of the strings before comparing.

            - tye (but my friends call me "Tye")
Re: DBI returns NULL characters in string
by converter (Priest) on Apr 24, 2001 at 22:26 UTC
    Any time you encounter mystery characters in your data, you should do a hex dump so that you know exactly what those characters are (guessing can work, but can bite you later if you guess wrong). If you haven't done so, try replacing this:
    @test = split (//,$something); foreach (@test) {print "$_\n";}
    with:
    for (split //, $something) { printf "%02X ", ord; }
    Given the input "J\x00O\x00H\x00N\x00" this will print:
    4A 00 4F 00 48 00 4E 00
    With that said, as far as I can tell, if your string contains embedded NULLs, your browser should interpret them as whitespace, so that it appears that there is a single space where each NULL character would be expected to appear. This would render your example as: J O H N

    If you're not seeing this behavior when interfacing with the CGI, but you are seeing it in the shell, you should probably do some testing to find out if the mystery characters show up in both environments. If not, then there's a problem with the environment in which your script is running (when running from the shell).

      I've verified that the mystery characters are indeed NULLS,
      and they only appear when running the script from the
      command line. Running it as a CGI script returns clean data.
      
      I've started looking into the differences in the
      environments, but so far have come up empty.
      
      Does anyone have any thoughts on what type of environment
      diffs could cause something like this?
      
      Thanks again.
      
      
        I just spotted this little tidbit in Programming the Perl DBI on page 299:

        The NVARCHAR2 and NCHAR variants hold string values of a defined national character set (Oracle 8 only). For those types the maximum number of characters stored may be lower when using multibyte character sets.

        ...

        Oracle automatically converts character data between the character set of the database defined when the database was created and the character set of the client, defined by the NLS_LANG parameter for the CHAR and VARCHAR2 types or the NLS_NCHAR parameter for the NCHAR and NVARCHAR2 types.

        CONVERT(string, dest_char_set, source_char_set) can be used to convert strings between character sets... See the "National Language Support" section of the Oracle Reference Manual for more details on character set issues.

        I have a hunch that the difference between the two environments has something to do with the NLS_LANG and/or NLS_NCHAR parameters. (Disclaimer: I am not a dba, so I could be barking up the wrong tree.)

Re: DBI returns NULL characters in string
by Caillte (Friar) on Apr 24, 2001 at 18:01 UTC

    I cannot duplicate this on my work's oracle database... perhaps you should put up the code you use to access the sql...

    Also, it would help to know what versions/platforms you use to test your code out, from the command line and via CGI.

    $japh->{'Caillte'} = $me;

      Here is the code that produced the strange results:
      #!/usr/bin/perl use DBI; use strict; my $sid = 'somedb'; my $user = 'someuser'; my $pass = 'somepassword'; my $dbh = DBI->connect("dbi:Oracle:$sid",$user,$pass, {AutoCommit => 0}) or die "$DBI::errstr"; my $sqlstring = "SELECT EMP_NAME,EMP_NUM FROM EMP WHERE EMP_NAME = 'JOHN'"; my $sth = $dbh->prepare($sqlstring) or die "$DBI::errstr"; my $rv = $sth->execute or die "$DBI::errstr"; my @fields = $sth->fetchrow_array(); $sth->finish; $dbh->disconnect; my @temp = split (//,$fields[0]); foreach (@temp) {print "$_\n"); # this prints the blank # lines between chars
      OUTPUT FROM CODE ABOVE:
      $>null_test.pl
      J
      
      O
      
      H
      
      N
      
      $>
      
      Software versions:
      
      Oracle8i Enterprise Edition Release 8.1.7.0.0
      perl, version 5.005_03 built for alpha-dec_osf
      $DBI::VERSION = "1.13";
      $DBD::Oracle::VERSION = '1.03';
      
      Thanks for looking at this