in reply to Re^2: MS Access Input -> Japanese Output
in thread MS Access Input -> Japanese Output

Presuming the data you get from mysql actually is in UTF-8, you could try two alternatives:

(1) - The straightforward approach: add a line binmode STDOUT, ":utf8"; (e.g. before the line print header(-charset=>'utf-8');

This might already suffice, as it's supposed to make sure that the CGI output is being written as UTF-8 (theoretically, there could still be a few other pitfalls, though...)

(2) - The safe approach: add a little conversion function

use Encode "encode"; sub U2Entity { return '&#x'.unpack("H*", encode("ucs2be", shift)).';'; }

and then modify the lines where you output the non-english table cells to read: print "<td>" . U2Entity( $resptr->{"Kana"} ); , etc.

In this case you no longer need to declare -charset=>'utf-8', as now everything is output in plain ASCII -- for example the Hiragana 'a' would be converted to its HTML entity representation &#x3042;

If both of these approaches do not work (which might well be the case), then the data from mysql most likely isn't in UTF-8. To figure out what you're actually being passed from the mysql side, try the following:

Add another little function

sub Any2Hex { return unpack("H*", shift); }

and then use that one instead of U2Entity(), i.e. print "<td>" . Any2Hex( $resptr->{"Kana"} );

and report back which hex values you're getting... :)

(In case you're merely seeing 3f (the hex value of the ASCII code representing the question mark character), then the problem is most likely the mysql setup... --> How did you create your mysql tables, etc...?)

Cheers,
Almut

BTW, just as an aside: to create a proper, complete HTML file, you'll probably also want to output the <html> and <body> tags...

Replies are listed 'Best First'.
Re^4: MS Access Input -> Japanese Output
by Zettai (Acolyte) on Nov 23, 2006 at 17:23 UTC
    Hi Almut,

    Thank you for all your help. I really do appreciate it. You were right. I am receiving ascii from the MySQL side. The output barring a couple of 4e41's is mainly 3f3f3f3f's. :(

    A really puzzling thing though is that while I was waiting for your response I managed to connect to the MySQL database with Java/JSP and it outputs to the webpage all of the Japanese.

    But as far as setting up the database in MySQL during the installation of the MySQL server 5.0 (v5.0.27) I chose the following parameters:

    1. Developer Machine
    2. Multifunctional Database
    3. InnoDB Tablespace Settings - I just chose where to store the data files.
    4. Decision Support(DSS)/OLAP - low number of connections cause at this stage it's just me mucking around with it.
    5. Enabled TCP/IP Networking and Strict Mode
    6. Best Support for Multilingualism - Make UTF8 the default character set (it even has Nihongo in Kanji which is a nice touch to this option)
    7. Install as a windows service and Include BIN directory in Windows PATH - both ticked.
    8. Set the root user password....etc....

    So the 'default character set' option is probably the most relevant.

    I then used Navicat which is a client that can connect to a MySQL database server; to Import all of the tables from MS Access into MySQL. First I did this into the 'test' database that is automatically generated by MySQL during it's installation but doesn't have any tables in it.

    When I opened an imported table I was overjoyed to see Japanese characters and not question marks. I then checked the tables in Navicat's design view and found the following (relevant) parameters:
    Character set: utf8
    Collation: utf8_general_ci

    So it would appear that the data is being stored in the MySQL database as utf8 based on the settings appearing in Navicat but Perl says it is ascii based on all of those hex codes returned and the display of question marks.

    The thing I don't get is that Java was able to do it on the JSP page. This makes me think that maybe it is an IIS thing. As I previously mentioned ASP works ok with a MS Access file but shows the same behaviour (i.e. the question marks) as Perl when used with the MySQL database.

    Perhaps the JSP works only because it is actually going through the embedded Tomcat webserver in the NetBeans IDE.

    What do you reckon? Looks like IIS eh? God I am so sorry!!! I was sure you would not reply last time. Thrown in the idiot user basket - "Are you sure your computer's plugged in..." scenario...

    But thank you so much for your patience and guidance. I will look further into the IIS angle. A quick google looks like this may well be the case.

    My sincerest apologies for the time you may have wasted on this but I really do appreciate you sticking with it for this long so we could at least narrow it down to a possible IIS problem.

    Thank you!!!

      considering what you described, I'd think it's a DBD::mysql driver thing...

      In fact, the module's docs mention an attribute

      mysql_enable_utf8

      This attribute determines whether DBD::mysql should assume strings stored in the database are utf8. This feature defaults to off. (...)

      Sounds promising, doesn't it... ;)

        Aaagh I spoke too soon. Have just gone through the pleasure but learning experience :) of installing Apache with Perl supported and lo and behold those beautiful question marks are still there when outputting to Firefox.

        mysql_enable_utf8 sounds very promising...so can you tell me how to turn it on in the code that I was using before?

        I am hoping you are going to say something other than:

        ... mysql_enable_utf8; ....

        because I tried that and am still getting question marks.

        Although thankfully it mentions in the doco that you have to use it in your connect statement. But not sure where it would go.

        Also does this mean I should be using something like:

        my $dbh = DBD->connect('DBD<stuff here>:ODBC:japmysql','','')
        or should it be left as:
        my $dbh = DBI->connect('DBI:ODBC:japmysql','','')

        Update:
        I managed to find an example on the web of how to do it so my code now looks like this:

        #!C:/Perl/bin/perl use DBI; use DBD::mysql; use CGI qw/:standard :html3/; use CGI::Carp qw(fatalsToBrowser); my $query = CGI -> new; use Encode "encode"; sub U2Entity { return '&#x'.unpack("H*", encode("ucs2be", shift)).';'; } sub Any2Hex { return unpack("H*", shift); } print header(-charset=>'utf-8'); print "<html><body>\n"; my $dbh = DBI->connect('DBI:ODBC:japmysql','','', {mysql_enable_utf8=> +1}) or die "Cannot connect: $DBI::errstr\n"; $dbh->{mysql_enable_utf8} or die "couldn't init mysql_enable_utf8"; my $sth = $dbh->prepare('Select English, Kana, Kanji from Vocab') or die("Couldn't prepare"); $sth->execute or die ("Could not execute SQL statement."); if($sth->rows == 0) { print "No information."; } else { print "<table border=1>\n"; while( $resptr = $sth->fetchrow_hashref() ) { print "<tr>"; print "<td>" . $resptr->{"English"}; print "<td>" . $resptr->{"Kana"}; print "<td>" . $resptr->{"Kanji"}; print "\n"; } print "</table>\n"; } print "</body></html>\n"; $dbh->disconnect;
        Unfortunately it dies at:
        $dbh->{mysql_enable_utf8} or die "couldn't init mysql_enable_utf8";
        Any ideas? By the way, where do subs usually go? Is it at the bottom of the file or at the top? Thanks again Almut.