Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Accented characters and others...

by kepler (Scribe)
on Oct 24, 2015 at 19:37 UTC ( [id://1145856]=perlquestion: print w/replies, xml ) Need Help??

kepler has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm having a hard time with this subject: I'm trying to tranform a string that might contain accented or other special character(like á, ì, ç, etc.) but without any luck. I've tryed using Unicode::Normalize

use Unicode::Normalize; use Encode; $str = "áçò"; $string = decode("ISO-8859-1", $str); #windows-1250 $string = NFD($string); $string =~ s/\pM//og;

It works in my laptop, but not in my webserver. I've installed the module (I got no error). In the substitution of "porto de mós", for instance, I get "porto de mas". The accented characters are always substituted by an "a"... Any ideas? I've also tryed Text::Unidecode. But I get even weirder characters...

Kind regards, Kepler

Replies are listed 'Best First'.
Re: Accented characters and others...
by Corion (Patriarch) on Oct 25, 2015 at 07:32 UTC

    Most likely, what you have is not encoded in the character set you think, then.

    I found Text::Unidecode to work pretty well, provided that its input is correctly encoded UTF-8.

    I recommend that you start with explicitly encoded strings and see if these work for you:

    #!perl use strict; use Text::Unidecode; my $str = "\N{LATIN SMALL LETTER A WITH ACUTE}\N{LATIN SMALL LETTER C +WITH CEDILLA}\N{LATIN SMALL LETTER O WITH GRAVE}"; print unidecode($str); # aco

    If that works for you, you can now take slow steps in the direction of reading data and properly calling Encode::decode() on it, while trying to find the appropriate character set(s) that the data is provided in.

Re: Accented characters and others...
by stevieb (Canon) on Oct 25, 2015 at 04:14 UTC
    I'm not a unicode person, but if this script outputs differently on separate systems, I'd ask you to post the platforms, along with the relevant parts of perl -V on each one. Those who can help may find this version info helpful.
Re: Accented characters and others...
by Anonymous Monk on Oct 25, 2015 at 15:39 UTC
    Use a monitoring tool, maybe WireShark, that can show you the ACTUAL bytes that are being exchanged between client and server ... as in "hexadecimal."

      And then what? What if the bytes correspond exactly to the charset and what is shown, which is what’s going to happen. Some code to diagnose? Some ideas for what to expect? Why Wireshark when all the modern browsers have trustworth dev panels or xxd|od + curl|wget give you easier, faster access to the information? The exact hex codes you predict will match? YOU just “can’t stop” … yourself. Probably it was another lost session. You will get slightly more upvotes this way. It will contribute to thinking this is a conspiracy. That’ll be wrong.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1145856]
Approved by AppleFritter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-03-28 20:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found