Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, Does anybody have experience with inserting Unicode data into an MSSQL table from perl/DBI? I am aware of the N'' syntax (insert into table values(N'ça')) but that is basically for "literal" Unciode text. I have a string read from XML, wich is utf-8 encoded. If I then write the string to a file and open it with a Unicode-'aware' editor I see the text OK. But both when I use
$insStr = qq{insert into table (string) values(?)}; $sthIStr = $dbh->prepare($insStr); $sthIStr->execute($UTF8_txt);
or
$insStr = qq{insert into table (string) values(N'$UTF8_txt')}; $dbh->do("$insStr");
The "long" appearantly unencoded ASCII-value is in the database-column (which is datatype nvarchar, btw). Real example: fiÅŸiere instead of fiþiere (Romanain for Files). What I am I doing wrong?

Replies are listed 'Best First'.
Re: Unicode insert into mssql from perl DBD
by graff (Chancellor) on Jan 02, 2008 at 14:44 UTC
    The problem may involve your method of querying data back from the table, and/or displaying the query results, as well as (or instead of) your method of inserting data into the table. It may also involve how the table is defined and how the database server is configured.

    What are you using to view the table contents after the insertion? If you use a perl script, there's a chance that you can inspect the byte sequence as it comes out of the table, and that you can use the Encode module to make sure it gets interpreted correctly for use in a particular display.

    I don't know about MSSQL, but I know that for MySQL, there are parameters at various levels for managing the character encoding -- for the server itself (set at start-up), for a given database, for a given table, for given fields in the table. If MSSQL provides docs on character encoding and collation, you'll need to consult that.

    The particular example you give is puzzling (and I'm not sure whether the posted data accurately reflects your actual data). You might want to look at the input and output via a hex-dump tool. (You might also consider this tool, posted at the Monastery as tlu -- TransLiterate Unicode, which is helpful for referring to characters by their unicode "names" and hex codepoint values. That's the best way to talk about the data in cases like this.)

      graff, thanks for the comment. If only I could use mysql on this project.... You're right about the "example", which was more obfucsating then helpful, I admit. The way you "tell" MicroSoft SQLserver that data is Unicode is by way of the datatype, strangely enhough. Whereas in mysql any text or (var)char datatype can have e.g. a utf8 COLLATION, in MSSQL you'd use nvarchar (instead of varchar) for storing Unicode data. I am using the so called MSSQL Management Studio Express to actually see the data in the tables. I can paste 'real' Unciode data into it through the interface, as I can run an INSERT-statement with the N'unicodedata' syntax, and it works fine. The problem is somewhere in ODBC/DBD driver, there the fact that the data is to be interpreted as Unicode is lost... Grtz=JPvdV

      I had a similar problem with mysql.Although the format of the table was in iso-8859-7 when passing the string from the script which had the IO layers switched to 8859-7 the data was not passed in correctly.
      I used the Encode module and worked.here is a the actual code

      foreach $key (keys %fileindex) { my $val = encode("iso-8859-7", $fileindex{$key}); $sth = $dbh->prepare ('INSERT INTO products4 (p_url,p_desc) VALUES(?,?)'); $sth->execute ($key,$val); $sth->finish ();
      check the utf-8 flag of the Encode module maybe that helps
Re: Unicode insert into mssql from perl DBD
by rhesa (Vicar) on Jan 02, 2008 at 15:32 UTC
    I've had similar issues when using DBD::ODBC. I fixed it by switching to DBD::ADO:
    # for ADO usage use DBI; use Win32::OLE; Win32::OLE->Option( CP => Win32::OLE::CP_UTF8 ); my $dbh = DBI->connect('dbi:ADO:Provider=Microsoft.Jet.OLEDB.4.0;Data +Source=some.mdb', '','', {LongReadLen=>100000} );
    See the DBD::ADO documentation for more details, usage, and limitations.
Re: Unicode insert into mssql from perl DBD
by graff (Chancellor) on Jan 02, 2008 at 19:25 UTC
    Based on this information in your "Anonymous" reply above:
    I am using the so called MSSQL Management Studio Express to actually see the data in the tables. I can paste 'real' Unciode data into it through the interface, as I can run an INSERT-statement with the N'unicodedata' syntax, and it works fine.
    The next question would be "what encoding is being used for unicode characters in that tool?" The point is that the difference between UTF-16 and UTF-8 is critical -- converting character data back and forth between the two is a fairly simple and common task, but if you make the mistake of treating one type as if it were the other, everything goes to hell.

    I haven't used MS tools enough to figure out which ones use UTF-16 vs. UTF-8, or how (or whether) they provide flexibility in this regard, but I do know that MS tools can be prone to use the encoding that you might not expect.

Re: Unicode insert into mssql from perl DBD
by Anonymous Monk on Jan 02, 2008 at 21:44 UTC

    Use DBD::ODBC 1.14 or newer, enable Unicode support (is enabled on Win32, is disabled on Non-Win32) and make sure that unicode data passed in via placeholders is properly tagged as utf8 in Perl.

    Alexander Foken

      Thanks, I'm going to give that a try! ADO works okay, but has some annoying drawbacks (e.g. I couldn't use a column named "language").
      I`ve tried both approaches. DBD-ODBC 1.14 is not in the activeState repository unfortunately and I have not succeeded in building it my self. Using Ado has poved more promissing, but I faced entirely new issues. (e.g. consecutive executes of a prepared insert will chop off values in varchar columns to the length of the first insert.... interesting). Life's a workaround.... Thanks for all the advice