I am trying to take some Japanese vocabulary from a MS Access file and then first print it as output to the screen but eventually I want to put it on a website.

You need to know which character encoding is being used for Japanese in the MS Access file. (I wouldn't really know; cp932 seems likely, or shiftjis may also work, but you should try to confirm that somehow. Check out Encode::Guess.)

And what sort of "screen" are you talking about? Is it an app that has the appropriate fonts and can correctly display the Japanese text data from Access? If so, it is presumably using the same character encoding that Access is using, and maybe you just want to preserve that encoding, even when putting the data onto a web page.

Preserving the existing encoding is easy enough -- just don't do anything but fetch the data and pass it along as-is. If you have reasons for converting it to unicode, utf8 is the best encoding for that (it's what perl uses internally, so you start with conversion to utf8 anyway). Note that you need a utf8-capable display to view such data. (It sounds like you have such a display tool already, since you mentioned seeing "question marks" where you expected Hiragana and Kanji -- that's what you get when a utf8-based display is given non-utf8 data.)

You would want to convert to utf8 if you intend to do regex matching, and/or substitutions, and/or any sort of character-based (rather than byte-based) manipulation on strings. Doing this sort of thing on non-unicode Japanese text is a risky business at best -- it's possible (and not that hard) to corrupt the data beyond recognition or repair.

"ascii1" is not a valid designation for any sort of character encoding. (How did you come up with that?)

Anyway, let's assume that the Access database has stuff in cp932. Here's how you'd asjust the OP code to output the data as utf8:

use DBI; use Encode; binmode STDOUT, ":utf8"; # this will avoid warnings on output my $dbh = DBI->connect('DBI:ODBC:japan','','') or die "Cannot connect: $DBI::errstr\n"; my $sth = $dbh->prepare('Select English, Kana, Kanji from Vocab') or die "Cannot prepare: $DBI::errstr\n"; $sth->execute or die "Cannot execute: $DBI::errstr\n"; my $rownum = 1; while( my ($eng,$kana,$kanji) = $sth->fetchrow_array() ) { # $eng is presumably ASCII already -- no conversion needed $_ = decode( 'cp932', $_ ) for ( $kana, $kanji ); printf( "%d:\t%s\t%s\t%s\n", $rownum++, $eng, $kana, $kanji ); } $dbh->disconnect;
(not tested, but should be close to what you need)

In reply to Re: MS Access Input -> Japanese Output by graff
in thread MS Access Input -> Japanese Output by Zettai

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.