Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Unidecode fails printing french accents

by edgreenberg (Acolyte)
on Jun 16, 2015 at 16:18 UTC ( [id://1130630]=perlquestion: print w/replies, xml ) Need Help??

edgreenberg has asked for the wisdom of the Perl Monks concerning the following question:

I have variables containing a unicode character C3 A9. The line looks like this:

00000750 27 41 63 63 6f 72 64 c3 a9 6f 6e 27 2c 20 27 64 |'Accord.. +on', 'd|

When I run this through Text::Unidecode, instead of getting accorde-with-acute-accent I get A(c).

I have:

use utf8; use Text::Unidecode;

and later (roughly): $var = unidecode($var)

Debugging shows that the variable is changed by the unidecode function:

Before: Montréal Centre-ville After: MontrA(c)al Centre-ville
I would expect a lower case 'e' as I had before this program was ported from 5.6.1 to 5.8.8 (and from freebsd to linux)

Any idea what I can do to correct this? The output is filenames that generate URLs, and the customer wants to match what she had before so as not to change the URLs in Google.

Thanks,

Ed Greenberg

Replies are listed 'Best First'.
Re: Unidecode fails printing french accents
by choroba (Cardinal) on Jun 16, 2015 at 16:26 UTC
    Works for me:
    #! /usr/bin/perl use warnings; use strict; use utf8; use Text::Unidecode; my $input = 'Montréal Centre-ville'; print unidecode($input); __END__ Output: Montreal Centre-ville

    If you're reading the input from a file, you don't need utf8. Instead, you should specify the encoding of the file:

    open my $IN, '<:encoding(UTF-8)', 'input.txt' or die $!;

    Update: You should consider editing the title. Unidecode fails recognising the accents, it wouldn't print them even if it worked.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Unidecode fails printing french accents (octets)
by Anonymous Monk on Jun 16, 2015 at 23:04 UTC
    Well, you probably don't have unicode inside $var , you're starting with not-unicode, with octets, so can't blame unidecode , you don't have uni
    $ perl -E " use Text::Unidecode; say unidecode( qq{\x4d\x6f\x6e\x74\x +72\xc3\xa9\x61\x6c} )" MontrA(c)al $ perl -E " use Text::Unidecode; use Encode; say unidecode( decode_ut +f8( qq{\x4d\x6f\x6e\x74\x72\xc3\xa9\x61\x6c}))" Montreal

    perlunitut: Unicode in Perl#I/O flow (the actual 5 minute tutorial)

Re: Unidecode fails printing french accents
by andal (Hermit) on Jun 17, 2015 at 07:09 UTC

    You should show how you get data into your $var. Most likely you have sequence of octets there instead of Unicode characters. Because of that Text::Unidecode fails, it expects sequence of Unicode characters as input.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1130630]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-03-29 11:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found