kemuri has asked for the wisdom of the Perl Monks concerning the following question:
Hello everybody!
I programmed a shell-application in C for Linux wich conjugated japanese verbs using strings functions.
Wait wait! It's about perl, I swear.
I'd like to implement a function that converts hiragana into romaji, I mean, さむらい ー[function]ー> samurai
As it's a hard job for my lazyness, I thought someone would have programmed this before: and I was obviously right. But the one I liked was written in Perl. One day I'll do it in C, but meanwhile I need your help!
This is the project: Lingua::JA::Moji
And the function I'm interested in is: kana2romaji
Okay, finish of the presentation. It's been two days since I started flirting with Perl. But by now I can't afford learning enough to solve it without help.
What I want:
1 Read a text of n lines, each with one hiragana word
2 Convert it by kana2romaji
3 Print it the same way but in romaji in another file
The code I have:
use Lingua::JA::Moji qw/kana2romaji romaji2kana/; #use utf8; #use Encode; if( $#ARGV < 1 ){ die("Not enough arguments\n"); } open(INP, "<$ARGV[0]") or die("Cannot open file '$ARGV[0]' for reading +\n"); open(OUTP, ">$ARGV[1]") or die("Cannot open file '$ARGV[1]' for writin +g\n"); my @hira = <INP>; print OUTP kana2romaji (@hira); close INP; close OUTP;
But kana2romaji complains the input is not in Unicode, more accurately, it complains the Unicode flag is OFF.
I found some options to solve the problem, but I haven't been able to manage it successfully:
Using Encode library
Using utf8 libraryuse Encode; @hira = Encode::decode( 'utf8', @hira );#if encode('utf8', decode('utf8', @hira), @hira)); @hira = decode("utf-8", @hira ); binmode @hira, ':encoding(utf8)';
I know more or less the differences between each function, but I get lost. The input file was created with gedit, so I guess it'll be unicoded... I got the ideas mainly from Encodeuse utf8; _utf8_on(@hira);
The program complain:
"Input is not flagged as unicode: conversion will fail. at file.pl line x"
I should pass some parameters to kana2romaji in order to get an injective function (I mean word to word, and not word to x possible words), but that's a detail.
I expect you will help me with my flash-travel to Perl, I'm sure that if I suceed I'll come back later :)
Thank you guys!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: UTF8 issues
by choroba (Cardinal) on Nov 14, 2010 at 23:10 UTC | |
|
Re: UTF8 issues
by ikegami (Patriarch) on Nov 15, 2010 at 00:34 UTC | |
|
Re: UTF8 issues
by Jim (Curate) on Nov 15, 2010 at 02:21 UTC | |
|
Re: UTF8 issues
by Jim (Curate) on Nov 15, 2010 at 03:57 UTC |