in reply to Re^3: MS Access Input -> Japanese Output
in thread MS Access Input -> Japanese Output

Hi Almut,

Thank you for all your help. I really do appreciate it. You were right. I am receiving ascii from the MySQL side. The output barring a couple of 4e41's is mainly 3f3f3f3f's. :(

A really puzzling thing though is that while I was waiting for your response I managed to connect to the MySQL database with Java/JSP and it outputs to the webpage all of the Japanese.

But as far as setting up the database in MySQL during the installation of the MySQL server 5.0 (v5.0.27) I chose the following parameters:

  1. Developer Machine
  2. Multifunctional Database
  3. InnoDB Tablespace Settings - I just chose where to store the data files.
  4. Decision Support(DSS)/OLAP - low number of connections cause at this stage it's just me mucking around with it.
  5. Enabled TCP/IP Networking and Strict Mode
  6. Best Support for Multilingualism - Make UTF8 the default character set (it even has Nihongo in Kanji which is a nice touch to this option)
  7. Install as a windows service and Include BIN directory in Windows PATH - both ticked.
  8. Set the root user password....etc....

So the 'default character set' option is probably the most relevant.

I then used Navicat which is a client that can connect to a MySQL database server; to Import all of the tables from MS Access into MySQL. First I did this into the 'test' database that is automatically generated by MySQL during it's installation but doesn't have any tables in it.

When I opened an imported table I was overjoyed to see Japanese characters and not question marks. I then checked the tables in Navicat's design view and found the following (relevant) parameters:
Character set: utf8
Collation: utf8_general_ci

So it would appear that the data is being stored in the MySQL database as utf8 based on the settings appearing in Navicat but Perl says it is ascii based on all of those hex codes returned and the display of question marks.

The thing I don't get is that Java was able to do it on the JSP page. This makes me think that maybe it is an IIS thing. As I previously mentioned ASP works ok with a MS Access file but shows the same behaviour (i.e. the question marks) as Perl when used with the MySQL database.

Perhaps the JSP works only because it is actually going through the embedded Tomcat webserver in the NetBeans IDE.

What do you reckon? Looks like IIS eh? God I am so sorry!!! I was sure you would not reply last time. Thrown in the idiot user basket - "Are you sure your computer's plugged in..." scenario...

But thank you so much for your patience and guidance. I will look further into the IIS angle. A quick google looks like this may well be the case.

My sincerest apologies for the time you may have wasted on this but I really do appreciate you sticking with it for this long so we could at least narrow it down to a possible IIS problem.

Thank you!!!

  • Comment on Re^4: MS Access Input -> Japanese Output

Replies are listed 'Best First'.
Re^5: MS Access Input -> Japanese Output
by almut (Canon) on Nov 23, 2006 at 17:56 UTC

    considering what you described, I'd think it's a DBD::mysql driver thing...

    In fact, the module's docs mention an attribute

    mysql_enable_utf8

    This attribute determines whether DBD::mysql should assume strings stored in the database are utf8. This feature defaults to off. (...)

    Sounds promising, doesn't it... ;)

      Aaagh I spoke too soon. Have just gone through the pleasure but learning experience :) of installing Apache with Perl supported and lo and behold those beautiful question marks are still there when outputting to Firefox.

      mysql_enable_utf8 sounds very promising...so can you tell me how to turn it on in the code that I was using before?

      I am hoping you are going to say something other than:

      ... mysql_enable_utf8; ....

      because I tried that and am still getting question marks.

      Although thankfully it mentions in the doco that you have to use it in your connect statement. But not sure where it would go.

      Also does this mean I should be using something like:

      my $dbh = DBD->connect('DBD<stuff here>:ODBC:japmysql','','')
      or should it be left as:
      my $dbh = DBI->connect('DBI:ODBC:japmysql','','')

      Update:
      I managed to find an example on the web of how to do it so my code now looks like this:

      #!C:/Perl/bin/perl use DBI; use DBD::mysql; use CGI qw/:standard :html3/; use CGI::Carp qw(fatalsToBrowser); my $query = CGI -> new; use Encode "encode"; sub U2Entity { return '&#x'.unpack("H*", encode("ucs2be", shift)).';'; } sub Any2Hex { return unpack("H*", shift); } print header(-charset=>'utf-8'); print "<html><body>\n"; my $dbh = DBI->connect('DBI:ODBC:japmysql','','', {mysql_enable_utf8=> +1}) or die "Cannot connect: $DBI::errstr\n"; $dbh->{mysql_enable_utf8} or die "couldn't init mysql_enable_utf8"; my $sth = $dbh->prepare('Select English, Kana, Kanji from Vocab') or die("Couldn't prepare"); $sth->execute or die ("Could not execute SQL statement."); if($sth->rows == 0) { print "No information."; } else { print "<table border=1>\n"; while( $resptr = $sth->fetchrow_hashref() ) { print "<tr>"; print "<td>" . $resptr->{"English"}; print "<td>" . $resptr->{"Kana"}; print "<td>" . $resptr->{"Kanji"}; print "\n"; } print "</table>\n"; } print "</body></html>\n"; $dbh->disconnect;
      Unfortunately it dies at:
      $dbh->{mysql_enable_utf8} or die "couldn't init mysql_enable_utf8";
      Any ideas? By the way, where do subs usually go? Is it at the bottom of the file or at the top? Thanks again Almut.

        Arghh, seems I spoke too soon, too... now that I'm realizing you're actually using the DBD::ODBC driver, not the DBD::mysql one, as I thought ;(

        In other words, the driver specific attribute mysql_enable_utf8 1 most likely won't apply here... But maybe there's some similar option with ODBC (?) However, a quick string search through the docs for "utf"/"unicode"/"wide" doesn't find any mention of either of those :( This usually means that it's (a) not supported at all, or (b) should work out of the box... Let's hope it's closer to the latter!

        I'm afraid I can't help you much with this any more, as I haven't really done a lot with databases under Windows myself (I'm more of a 'unix gal'...).

        Well, I'd do some more googling on the issue of "ODBC + utf8 / unicode", or some such. And if it should really turn out to be unsupported (which I can't quite believe), then you could still try to use the DBD::mysql driver instead, which - as I understand things - should be available for Windows as well...

        If this doesn't get you anywhere, I'd just start a new thread here -- I guess there aren't many Monks reading "Re^N:" levels with N > 4, in old threads ...

        Good luck anyway!

        1 which you would have specified like this, for example

        $dbh= DBI->connect("DBI:mysql:dbname;mysql_enable_utf8=1", ...) or $dbh->{mysql_enable_utf8} = 1;