7stud has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I have a latin-1 string that I am trying to do some substitutions on, but the substitution is not happening:
use strict; use warnings; use 5.010; use Encode; my $str = "ThĀt Āpple"; #'A with circumflex' say $str; my $unicode_str = decode('iso-8859-1', $str); my $pattern = "\x{00C2}"; #unicode for 'A with circumflex' $unicode_str =~ s/$pattern//g; my $latin1_str = encode('iso-8859-1', $unicode_str); say $latin1_str;
The output from the first say(), indicates there is something wrong from the very beginning. Instead of seeing an "A with circumflex", I see an "A with tilde". Then no substitution is performed, and I see the same string that is output the first time. My terminal is set to Latin-1.
If I add a use utf8 statement, then for the first say() I see "A with diaeresis(umlaut)", but then the substitutions are performed, and the second say() outputs "Tht pple". However, the utf8 docs specifically warn against using a use utf8 statement for upper Latin-1 codes, and also because I'm not using any utf8 characters in my program file, it doesn't make sense to me to include that statement.
How can I get the first say() to output a string where I see "A with circumflex", and how can I get the substitution to work as well?
Thanks
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: encoding question
by moritz (Cardinal) on May 20, 2010 at 07:13 UTC | |
by 7stud (Deacon) on May 20, 2010 at 07:26 UTC | |
by moritz (Cardinal) on May 20, 2010 at 08:17 UTC | |
by 7stud (Deacon) on May 21, 2010 at 03:21 UTC | |
by moritz (Cardinal) on May 21, 2010 at 07:03 UTC | |
|
Re: encoding question
by Krambambuli (Curate) on May 20, 2010 at 07:52 UTC | |
|
Re: encoding question
by JavaFan (Canon) on May 20, 2010 at 07:54 UTC | |
by Krambambuli (Curate) on May 20, 2010 at 08:56 UTC | |
by moritz (Cardinal) on May 20, 2010 at 09:00 UTC | |
by JavaFan (Canon) on May 20, 2010 at 13:38 UTC | |
by ikegami (Patriarch) on May 21, 2010 at 04:21 UTC | |
by 7stud (Deacon) on May 23, 2010 at 04:17 UTC | |
by choroba (Cardinal) on May 20, 2010 at 08:34 UTC |