swapnil has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I am using Text::Unaccent module to unaccent a string. This is my code
#!/usr/bin/perl use strict; use warnings; use Text::Unaccent; my $str = "Los-Cabos-Meliá"; #my $str = "This is a simple string"; my $unaccented = unac_string('UTF-8',$str); print "Original : ".$str."\n"; print "Unaccent : ".$unaccented."\n";

This thing works fine on my local machine but strangely gives 'Invalid argument' error on other test machine with same version of Text::Unaccent. Please suggest what might be the issue.

--------- OUTPUT ---------------
unac_string: Invalid argument Original : Los-Cabos-Meli Use of uninitialized value $unaccented in concatenation (.) or string +at remove_accent2.pl line 12. Unaccent :
Thanks,
Swapnil

Replies are listed 'Best First'.
Re: Issue with unac_string()
by choroba (Cardinal) on May 17, 2012 at 09:16 UTC
    You should tell Perl your script is written in UTF-8. Add
    use utf8;
    to your script.
Re: Issue with unac_string()
by Khen1950fx (Canon) on May 17, 2012 at 10:33 UTC
    Using Text::Unaccent::PurePerl, Encode, Encode::Detect:
    #!/usr/bin/perl -l use strict; use warnings; use Text::Unaccent::PurePerl qw(unac_string); use Encode; require Encode::Detect; my $str = "Los-Cabos-Meli\u00e1"; my $utf8 = decode('Detect', $str); binmode STDOUT, ":encoding(UTF-8)"; print "Original : $utf8"; my $unaccented = unac_string($utf8); print "Unaccent : $unaccented";
      by using utf8, I am getting the follwing warning "Malformed UTF-8 character (1 byte, need 3, after start byte 0xe1)".
      I tried this code , Yet it didnt work.
      #!/usr/bin/perl use strict; use warnings; use Text::Unaccent::PurePerl qw (unac_string); use URI::Escape; my $str = "Los-Cabos-Meliá"; #my $str = "This is a simple string"; #my $str = "Zo%C3%ABtry-Casa-del- (Mar) -Los-Cabos"; #my $str = "http%3A%2F%2Fwww.travelnow.com%2Fvtours%2F281578.xml"; #my $unescaped = uri_unescape($str); #my $unaccented = unac_string('UTF-8',$unescaped); my $unaccented = unac_string('UTF-8',$str); print "Original : ".$str."\n"; #print "Unescape : ".$unescaped."\n"; print "Unaccent : ".$unaccented."\n";
      the output is:
      Original : Los-Cabos-Meli Unaccent : Los-Cabos-Meli�
        Ah! You're functioning better than I am:). This seems to work for me. Let me know if it works for you.
        #!/usr/bin/perl -l use strict; use warnings; use Encode; require Encode::Detect; use Text::Unaccent::PurePerl qw(unac_string); use URI::Escape::XS qw/uri_escape uri_unescape/; my $str = "Los-Cabos-Meli\303\241"; my $safe = uri_escape($str, "\303\241"); $str = uri_unescape($safe); my $utf8 = decode('Detect', $str); binmode STDOUT, ":encoding(UTF-8)"; print "Origninal: $utf8"; my $unaccented = unac_string($utf8); print "Unaccented: $unaccented";