ajaykumarb has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am working with below environment..

Perl 5.8.9
Windows 2003
SQL Server 2005

I have to read data from a UTF-8 text file and insert it to database. Data is seperated by colon in iput file.

I am performing below steps...
1. Connecting to database and preparing an sql statement
2. Reading the data from a file line by line and decoding it using decode_utf8() method
2. Splitting the data and reading into variables
3. Executing the sql statement
4. Closing database connection

When i insert data without setting $dbh->{odbc_default_bind_type} = 12, It is throwing an error "MicrosoftODBC SQL Server DriverInvalid character value for cast specification (SQL-22018)". So i had set it to 12 and it is working fine. I do have Thai characters in input data file. The issue is when i insert Thai characters, they are getting inserted with some junk characters. When i execute the same query in database manually, it is inserting properly.

In database, fields are with datatypes nvarchar, datetime, int and float.

Below is the code snippet, working fine with out Thai characters...

#!/usr/bin/perl use DBI; use Encode; my $AbsFileName = "E:\\sample\.txt"; open (A123, "$AbsFileName"); my ($DSN,$dbh,$sth); $dbh = DBI->connect("dbi:ODBC:DSN=<dsn name>;UID=<username>;PW +D=<password>") or die "$DBI::errstr\n"; $dbh->{odbc_default_bind_type} = 12; # SQL_VARCHAR my $sth = $dbh->prepare(<<SQL) or die "$DBI::errstr\n"; INSERT INTO BK_imageINfo (CustFld1,CustFld2,CustFld3,CustFld4, +CustFld5,CustFld6,CustFld7,CustFld8,CustFld9,CustFld10,CustFld11,Cust +Fld12,Efield1,Efield2,Efield3,Efield4,Efield5) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) SQL while (<A123>) { my $data = decode_utf8( $_ ); chomp($data); my ($cfd1, $cfd2, $cfd3, $cfd4, $cfd5, $cfd6, $cfd7, $cfd8 +, $cfd9, $cfd10, $cfd11, $cfd12$enr1, $enr2, $enr3, $enr4, $enr5) = s +plit /:/, $data; ## $enr1-5 contains Thai characters... In database, Thai c +har need to be inserted as N'Thai chars'..So prepending with 'N' my $enr11 = "N". $dbh->quote( $enr1 ); my $enr21 = "N". $dbh->quote( $enr2 ); my $enr31 = "N". $dbh->quote( $enr3 ); my $enr41 = "N". $dbh->quote( $enr4 ); my $enr51 = "N". $dbh->quote( $enr5 ); open (ABC, ">>query.txt"); my $sql = "INSERT INTO BK_imageINfo (CustFld1,CustFld2,Cus +tFld3,CustFld4,CustFld5,CustFld6,CustFld7,CustFld8,CustFld9,CustFld10 +,CustFld11,CustFld12,Efield1,Efield2,Efield3,Efield4,Efield5) VALUES +('$cfd1', '$cfd2', '$cfd3', '$cfd4', '$cfd5', '$cfd6', '$cfd7', '$cfd +8', '$cfd9', '$cfd10', '$cfd11', '$cfd12', '$enr11', '$enr21', '$enr3 +1', '$enr41', '$enr51')"; print ABC "\n$sql\n"; close ABC; $sth->execute($cfd1, $cfd2, $cfd3, $cfd4, $cfd5, $cfd6, $c +fd7, $cfd8, $cfd9, $cfd10, $cfd11, $cfd12, $enr11, $enr21, $enr31, $e +nr41, $enr51) or die "$DBI::errstr\n"; } close A123; #Close the database $sth->finish; $dbh->disconnect;

Please help me in inserting Thai characters into SQL Server database?


Replies are listed 'Best First'.
Re: Issue while inserting Thai Characters into Database..
by afoken (Chancellor) on Jan 12, 2010 at 10:17 UTC
    • Make sure you are running the latest stable version of DBD::ODBC (currently 1.23).
    • Make sure it is build with Unicode support (Default enabled on Windows, Default disabled on non-Windows systems).
    • Read the documentation of DBD::ODBC, especially the parts documenting Unicode support.
    • Read it again.
    • Really, read it again. DBD::ODBC supports Unicode (I wrote the initial patch, and MJE improved it), but don't try to be smarter than DBD::ODBC. Just pass the perl strings to DBD::ODBC.
    • Have a look at the unicode test scripts t/40UnicodeRoundTrip.t and t/41Unicode.t for example code.
    • Make sure all strings passed to $sth->execute contain the characters in perls own internal representation (i.e. UTF-8, with UTF-8-Flag set).
    • Don't mess with $dbh->quote, execute() takes care of properly quoting the values. You are quoting twice, that won't work at all. You do not need the N"xxx" notation, DBD::ODBC and the ODBC driver will take care of all that.
    • Get rid of $dbh->{odbc_default_bind_type} = 12; # SQL_VARCHAR. It won't work. It is the wrong data type (no Unicode support).
    • And by the way, use the three-arument form of open with a "my" variable, like this: open my $handle,'>>','file.ext' or die "Can't open file.ext for writing: $!".

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      I have little to add to Alexander's fine comments other than to reiterate it is perfectly possible to insert UTF-8 encoded perl scalars into MS SQL Server with DBD::ODBC.

      Go through his list, make the corrections he suggests and examine the unicode examples in the t/ subdirectory of DBD::ODBC.

      In addition to what Alexander said you can avoid that call to decode_utf by adding a filter to your open command e.g., open(FH, "<:encoding(UTF-8)", "file") (see open)

      Do upgrade to the latest version of DBD::ODBC if you can as many issues have been fixed since the version you are running.

      If you follow this advice and still have a problem try and create a small complete example which does not work, include schema definition, run the code you submit and include the output.

      Hi Alexander,

      Thanks for your response.
      Currently i am using DBD::ODBC version 1.16, it came defaultly with Perl 5.8.9. Please advise whether it supports UTF-8 characters?
      If not, how can i get the compiled version of 1.23? I searched in CPAN, but it is not compiled version...

      All strings which i am passing to $dbh-> execute() are in UTF-8 only, because i tried writing them to a file, they are written properly.

      The problem is, if i don't mention $dbh->{odbc_default_bind_type} = 12; # SQL_VARCHAR, then it is throwing 'MicrosoftODBC SQL Server DriverInvalid character value for cast specification (SQL-22018)' error. How can i get rid of it?

      Please advise. Thanks for your valuable time.

      Thanks,
      Ajay
        Currently i am using DBD::ODBC version 1.16, it came defaultly with Perl 5.8.9. Please advise whether it supports UTF-8 characters?

        Learn how to read the documentation. DBD::ODBC has a changelog documenting all significant changes, including major updates like Unicode support, and of course bugfixes. DBD::ODBC 1.16 has some Unicode support, but with more issues and bugs and less features than 1.23. Depending on how your copy of DBD::ODBC was compiled, Unicode support may be disabled. Read my previous posting about that. There is a documented way to test for Unicode support in DBD::ODBC, see odbc_has_unicode in the DBD::ODBC documentation.

        If not, how can i get the compiled version of 1.23? I searched in CPAN, but it is not compiled version...

        Of course there is no compiled version. Perl is distributed as source code, as are the Perl modules. You need to compile it yourself, matching your specific machine, operating system, and environment. Typically, you invoke the cpan command line utility with the exact name of the module you wish to install, and it takes care of everything. Read the Perl documentation, start with http://perldoc.perl.org/perlfaq8.html#How-do-I-install-a-module-from-CPAN%3F.

        <update>Of course, that will bite you when you compile on non-Windows systems, because DBD::ODBC's default is to disable Unicode support on non-Windows systems.</update>

        All strings which i am passing to $dbh-> execute() are in UTF-8 only, because i tried writing them to a file, they are written properly.

        Writing them to a file does not say anything about their representation in Perl. Read and UNDERSTAND the Perl Unicode documentation. Start with perluniintro and follow the links in the "See Also" section. Maybe you are still working with bytes representing an UTF-8 string (UTF-8 flag off) instead of Unicode characters (UTF-8 flag on). DBI requires the latter or you will get very frustrating results. See also the two test scripts in my previous posting.

        The problem is, if i don't mention $dbh->{odbc_default_bind_type} = 12; # SQL_VARCHAR, then it is throwing 'MicrosoftODBC SQL Server DriverInvalid character value for cast specification (SQL-22018)' error. How can i get rid of it?

        Read and UNDERSTAND my previous posting. Just remove the entire line. Remove the manual quoting lines, too. Pass strings with UTF-8 flag set to the DBI methods and everything will be well.

        I've worked for years with DBD::ODBC and MS SQL Server, and since I added the Unicode patch to DBD::ODBC that was included in DBD::ODBC 1.14, you can simply pass Perl Unicode strings as parameters, without any tricks required. The various fetch methods will return Perl Uncode strings, also without any tricks required. MJE added Unicode support for more DBI methods, but that is not relevant for your problem. Look at the two test scripts, and UNDERSTAND how the ASCII and Unicode strings are passed from and to the DBI API.


        I will gladly help you to UNDERSTAND what happens, even if we have to discuss every single line of the DBI source code, the DBD::ODBC source code and of your script. But I won't "just fix your code" by sprinkling some magic powder over it (just imagine a 130 kg Tinkerbell -- it won't work ;-)). It is that "Give a man a fish, you have fed him for today. Teach a man to fish, and you have fed him for a lifetime." thing. If I just fix your code for you now, you will be here again in five minutes and ask for help for the next 0.000001% step of your problem, and again and again and again. It ends with me (or the other perlmonks) doing your job for free, instead of being able to help someone who really wants to learn. So, I will try to teach you how to solve this problem by yourself, hoping that you will learn how to solve future problems with less help from others. Careful reading of documentation and answers you were given is part of the lesson you have to learn.

        Of course, Perl is a complex beast, and just replacing perlmonks.org with a big fat "RTFM" page won't help. Nobody will ever be able to solve all of his perl problems all by himself. Discussion is needed, and many people have learned a lot from other people spending their time here.

        It is a problem of attitude, and I've recently become a little bit allergic to that "Please advise" phrase in the meaning of "Just fix my f*cking code, I don't care why it doesn't work". Because that attitude won't work here, at least not for long.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Issue while inserting Thai Characters into Database..
by mje (Curate) on Jan 19, 2010 at 08:55 UTC

    I was thinking about your problem on the way to work this morning and realised I might have missed something. Are you definitely saying that if you do not set odbc_default_bind_type you always get an error SQL-22018? odbc_default_bind_type should only be used if a) SQLDescribeParam is not supported or b) SQLDescribeParam fails. I suspect this is because you are using an older DBD::ODBC which did not behave this way and upgrading to 1.23 will ignore odbc_default_bind_type with MS SQL Server in your case.

      Hi Alexander/mje

      Thanks for your responses and guidance.
      Now i have changed to 3 argument open statement, by this avoided using decode_utf8() method.
      I have downloaded DBD::ODBC v1.23 . I am also installing VC++ to compile it.
      I have gone through READMEs of both 1.16 and 1.23, i see odbc_utf8_on flag in latest version. I will continue solving my problem after installing 1.23 and update the status.

      Thanks,
      Ajay

        Best of luck - will wait to hear what progress you make.