Text::Iconv module usage

daptal has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use Text::Iconv module and not able to get desired results .
Here is a code snippet

use strict;
use Text::Iconv;




my $converter = Text::Iconv->new('UTF-8','UTF-8');



my @str = ('galicia','a coruÃÂ±aa',,'a coruÃ±aa',,'baÃCÃE kasÃCÃD nÃCÃ
+D n msn ÃCÃE ifresini');

push (@str ,' galicia a coruï¿½a ');

foreach my $q (@str){
        my $result = $converter->convert($q);
        print "$q  ==>  $result \n";

}


Result:-
galicia  ==>  galicia
a coruÃÂ±aa  ==>  a coruÃÂ±aa
a coruÃ±aa  ==>  a coruÃ±aa
baÃCÃE kasÃCÃD nÃCÃD n msn ÃCÃE ifresini  ==>  baÃCÃE kasÃCÃD nÃCÃD n 
+msn ÃCÃE ifresini
 galicia a coruï¿½a   ==>   galicia a coruï¿½a
[download]

where as if i use the cmd line option
iconv -c -f UTF-8 -t UTF-8 -o resultfile originalfile
Eg:- diff resultfile originalfile a corua a coruï¿½a
Can you please let me know where i am going wrong.
Thanks

Comment on Text::Iconv module usage Download Code

Replies are listed 'Best First'.
Re: Text::Iconv module usage by graff (Chancellor) on Feb 17, 2009 at 02:20 UTC
Do you have some compelling requirement to use Text::Iconv instead of the native Encode module and/or PerlIO encoding layers that come with Perl? Looking at these passages from the Text::Iconv docs, it seems like there are some good reasons to avoid it and use Encode/PerlIO instead: ... Settings of fromcode and tocode and their permitted combinations are implementation-dependent. Valid values are specified in the system documentation [i.e. on each particular system]... retval() returns the return value of the underlying iconv() function for the last conversion; according to the Single UNIX Specification, this value indicates "the number of non-identical conversions performed." Note, however, that iconv implementations vary widely in the interpretation of this specification... get_attr(): This method is only available with GNU libiconv, otherwise it throws an exception... I've had only passing acquaintance with iconv, and that was a few years ago, but based on those passages, it seems that if you decide to use Text::Iconv, you're asking for non-portability. The standard perl Encode module and PerlIO don't seem to have issues of that sort, from what I've seen so far. (I'm guessing that the problem with iconv might be the presence of several competing implementations -- despite the term "Single UNIX" in its description -- kind of like all the different versions of the "standard" `ps` utility.) As for the problem you're having with that code snippet... Why are you converting from utf8 to utf8? Could you try using the script in this node -- tlu -- TransLiterate Unicode -- to look at your code/data, and re-post with the actual unicode codepoint hex values for the characters in question? The character sequences in the OP (when I looked at it) seemed like corrupt data, not interpretable as utf8. If you don't have a firm, explainable reason why you must use Text::Iconv, I would suggest that you avoid using it. Perhaps tell us more about what processing problem you are trying to solve -- there's bound to be a better way to do it. (updated to replace unintended "link" with proper square brackets in quoted docs)	[reply] [d/l]
Re: Text::Iconv module usage by almut (Canon) on Feb 17, 2009 at 02:05 UTC
What are you trying to do? Convert UTF-8 to UTF-8? If so, the problem presumably is that you haven't told Perl that your input strings (`@str`) are (supposed to be) in UTF-8. Try `use utf8;` — it tells Perl that literal strings in the source code are in UTF-8 encoding.	[reply] [d/l] [select]
Re: Text::Iconv module usage by syphilis (Archbishop) on Feb 17, 2009 at 05:34 UTC
It looks to me that the perl script correctly converts from UTF-8 to UTF-8 (and I get the same output as you). What is the problem with that output ? I gather that the `iconv` command that you ran did not produce the desired result, but we don't know what was in "originalfile" to begin with. Cheers, Rob	[reply] [d/l]