I'm working on a programming project where the code is documented in Japanese (S-JIS). Since most of us here can't read Japanese, we've been trying a lot of things to get it translated. My IDE is CodeWright, which conveniently includes a Perl interpreter for macros. My hope is to be able to highlight a Japanese string and translate it on the fly. I'm attempting to write a macro for it eventually, but for now I'm writing a standalone script:
#!perl -w
use strict;
use Jcode;
my $DEBUG = 1;
my $text = '';
for(<>)
{
if($DEBUG)
{
my ($code) = getcode($_);
print "Chunk encoded as: " . $code . "\n";
}
my $j = Jcode->new($_);
$text .= $j->utf8 . "\n";
}
print "\nText to send:\n" . $text . "\n" if $DEBUG;
print "\nConnecting to translator... please wait.\n\n";
use WWW::Babelfish;
my $obj = new WWW::Babelfish();
die( "Babelfish server unavailable\n" ) unless defined($obj);
print "\nTranslating... this may take a loooong time.\n\n";
my $english = $obj->translate(
source => 'Japanese',
destination => 'English',
text => $text,
delimiter => '\n',
);
print "\nTranslation: \n\n";
print $english;
print "\n";
If it were an ASCII-friendly language like French, or German, I wouldn't have any trouble. But since it's Japanese, I figured I'd have to meddle with the encoding and put it in UTF-8 for Babelfish... I used Jcode to do this, but I'm not sure WWW::Babelfish is robust enough to handle the multi-byte encodings... any pointers would be appreciated...
--isotope
http://www.skylab.org/~isotope/
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.