As the sample code below shows, I can assign two (non-ASCII/Latin-1) character string values that are supposed to be the same, and the two versions are stored differently in my database. Other experiments suggest that reading the data is not the problem, it's INSERT-ing or UPDATE-ing the data where the problem arises. Can anyone shed some light on this one (or point me to a better source?)

Version info: I'm using ActiveState Perl, version 5.008008. The operating system is Windows XP Pro, SP2. The database is MS SQL Server 2000, SP3. The DBI version is 1.50. The DBD-ODBC version is 1.13. The code below runs stand-alone, e.g., not in a Web server.

(By the way, changing the column type from varchar to ncharvar doesn't affect the results.)

Here's the code. Be warned that the code contains some UTF-8 characters that are not Latin-1: If you want to test the program, just change the DBI->connect line.

use strict; use DBI; use charnames ':full'; use vars qw($trace); print "Perl version is $]\n"; # Output is "Perl version is 5.008008" my $text1 = 'Priorit' . "\N{LATIN SMALL LETTER A WITH DIAERESIS}" . 't'; my $text2 = 'Priorität'; if ($text1 eq $text2) {print 'Equal '}; if ($text1 ne $text2) {print 'Different '}; print length($text1) . ' ' . length($text2) . "\n"; # Output is "Equal 9 9" # Connect to the database and create a database handler my $dbh = DBI->connect('dbi:ODBC:dataSource', 'userID', 'password', { RaiseError => 0, AutoCommit => 0 } ) || die $DBI::errstr; do_sql(<<'--------'); CREATE TABLE dummy1 ( id1 int NOT NULL, str1 varchar(50) NOT NULL ) ON [PRIMARY] -------- do_sql("INSERT INTO dummy1 (id1, str1) VALUES(1, '$text1')"); do_sql("INSERT INTO dummy1 (id1, str1) VALUES(2, '$text2')"); # According to the SQL Query Analyzer program, the table contains: # 1, 'Priorität' (a 10 character value) # 2, 'Priorität' (a 9 character value) my $array_ref = $dbh->selectall_arrayref('SELECT * FROM dummy1'); for my $r (@$array_ref) { print ${$r}[0] . ': ' . length(${$r}[1]) . "\n"; } # Output is # 1: 10 # 2: 9 $dbh->commit; $dbh->disconnect || warn $dbh->errstr; sub do_sql { $dbh->do($_[0]); if ($dbh->errstr()) { print STDERR 'error number = ' . $dbh->err() . "\n"; print STDERR "The following SQL statement failed:\n$_[0]\n"; $dbh->disconnect || warn $dbh->errstr; die; } }

In reply to database stores UTF8 strings inconsistently by robv

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.