daptal has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use Text::Iconv module and not able to get desired results .
Here is a code snippet
use strict; use Text::Iconv; my $converter = Text::Iconv->new('UTF-8','UTF-8'); my @str = ('galicia','a coruñaa',,'a coruñaa',,'baĂCĂE kasĂCĂD nĂCĂ +D n msn ĂCĂE ifresini'); push (@str ,' galicia a coruďż˝a '); foreach my $q (@str){ my $result = $converter->convert($q); print "$q ==> $result \n"; } Result:- galicia ==> galicia a coruñaa ==> a coruñaa a coruñaa ==> a coruñaa baĂCĂE kasĂCĂD nĂCĂD n msn ĂCĂE ifresini ==> baĂCĂE kasĂCĂD nĂCĂD n +msn ĂCĂE ifresini galicia a coruďż˝a ==> galicia a coruďż˝a
where as if i use the cmd line option
iconv -c -f UTF-8 -t UTF-8 -o resultfile originalfile
Eg:- diff resultfile originalfile a corua a coru�a
Can you please let me know where i am going wrong.
Thanks

Replies are listed 'Best First'.
Re: Text::Iconv module usage
by graff (Chancellor) on Feb 17, 2009 at 02:20 UTC
    Do you have some compelling requirement to use Text::Iconv instead of the native Encode module and/or PerlIO encoding layers that come with Perl? Looking at these passages from the Text::Iconv docs, it seems like there are some good reasons to avoid it and use Encode/PerlIO instead:

    ... Settings of fromcode and tocode and their permitted combinations are implementation-dependent. Valid values are specified in the system documentation [i.e. on each particular system]...

    retval() returns the return value of the underlying iconv() function for the last conversion; according to the Single UNIX Specification, this value indicates "the number of non-identical conversions performed." Note, however, that iconv implementations vary widely in the interpretation of this specification...

    get_attr(): This method is only available with GNU libiconv, otherwise it throws an exception...

    I've had only passing acquaintance with iconv, and that was a few years ago, but based on those passages, it seems that if you decide to use Text::Iconv, you're asking for non-portability. The standard perl Encode module and PerlIO don't seem to have issues of that sort, from what I've seen so far.

    (I'm guessing that the problem with iconv might be the presence of several competing implementations -- despite the term "Single UNIX" in its description -- kind of like all the different versions of the "standard" ps utility.)

    As for the problem you're having with that code snippet... Why are you converting from utf8 to utf8? Could you try using the script in this node -- tlu -- TransLiterate Unicode -- to look at your code/data, and re-post with the actual unicode codepoint hex values for the characters in question? The character sequences in the OP (when I looked at it) seemed like corrupt data, not interpretable as utf8.

    If you don't have a firm, explainable reason why you must use Text::Iconv, I would suggest that you avoid using it. Perhaps tell us more about what processing problem you are trying to solve -- there's bound to be a better way to do it.

    (updated to replace unintended "link" with proper square brackets in quoted docs)

Re: Text::Iconv module usage
by almut (Canon) on Feb 17, 2009 at 02:05 UTC

    What are you trying to do? Convert UTF-8 to UTF-8?  If so, the problem presumably is that you haven't told Perl that your input strings (@str) are (supposed to be) in UTF-8.

    Try use utf8; — it tells Perl that literal strings in the source code are in UTF-8 encoding.

Re: Text::Iconv module usage
by syphilis (Archbishop) on Feb 17, 2009 at 05:34 UTC
    It looks to me that the perl script correctly converts from UTF-8 to UTF-8 (and I get the same output as you). What is the problem with that output ?

    I gather that the iconv command that you ran did not produce the desired result, but we don't know what was in "originalfile" to begin with.

    Cheers,
    Rob