First I am new to Perl and I have searched everywhere for an answer to this question without much luck, but I thought I would try you enlightened people.

I have a new application done entirely in Java(groan) that works in Japanese for a major corporation. It turns out though, that the reporting structure for the company (extracts, formatting, etc..) is done entirely in Perl 5.6.0. This is what Perl is excellent at doing so it should be fine. Well the problem boils down to legacy system and I am not sure what to do about it. I have Perl 5.6.0, Oraperl module to connect to Perl and JCODE to deal with the Japanese encodings.

All should be well, but basically what I am finding out is that Perl, even with Use UTF8 does not use UTF8 well. We are extracting garbage out of the database that is consistent with the ASCII representation of UTF8 / ShiftJIS data. In otherwords, if you do not have the font, then this how that kanji character will look. Oracle 8i is set to run in UTF8 and it is preserving the encoding correctly and Java seems to deal with the data correctly (our application runs in Japanese) but Perl does nothing with this information. First if we write it out to a text file, all encoding is lost. I know there is a way to ensure that UTF8 is used for output in Perl 5.8 but we cannot upgrade. Second it actually appears that Oraperl is dropping the encoding information whenever we retrieve data. I am fairly certain of this because when we call JCODE to convert the character sets, the input is there, but then it does not tranform the characters and it ends up having an empty string.

What I am asking, is if anyone has experience using UTF8, Oracle 8i, Perl 5.6, Oraperl and in Japanese can shed some enlightenment my way. I am having a heck of a time.
Thank you very much in advance of any information,

Akira Yamashita


In reply to UTF-8, Oracle and Perl life by Akira71

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.