I am trying to convert the euro symbol from utf-8 to ISO-8859-1. So far I am having no luck. I am using Perl 5.8.5.
I have a template, which is in utf-8 and filled with utf-8 data (from another source). I'm encoding the result in ISO-8859-1 using Unicode::MapUTF8::from_utf9() (version 1.09).
This is what I have tried, with the following conversion code:
sub sms_encode {
my $text = shift or return undef;
my $new = from_utf8({-string => $text, -charset => 'ISO-8859-1'});
return $new;
}
- Typing a literal € in the template -> a literal € in the result
- Using the utf-8 typed symbol in the template -> gives whitespace
- Using alt-0128 on Windows, transfering the file to the server, reading the file in to the template -> gives whitespace
- Using a literal € in the template -> a literal € in the result
At this point I wrote a small script and verified that Unicode::MapUTF8 was returning whitespace when given the input from attempt 2. I then changed my code to protect the symbol from Unicode::MapUTF8:
sub sms_encode {
my $text = shift or return undef;
my $placeholder = 'THIS_WILL_BE_THE_EURO_SYMBOL';
my $new =~ s/â¬/$placeholder/; # Regex A
$new = from_utf8({-string => $text, -charset => 'ISO-8859-1'});
$new =~ s/$placeholder/€/; # Regex B
return $new;
}
It's quite likely the symbols aren't displaying correctly; in Regex A I am using the symbol that was used in attempt 2 above (literally typed character in utf-8), and in Regex B I am using the symbol that was used in attempt 3 above (literally typed character in ISO-8859-1). I then tried:
- The version displayed in the code, without utf-8 pragma -> whitespace
- Replacing the symbol in Regex A with a literal ISO-8859-1 character, keeping Regex B the same -> whitespace
- The version displayed in the code, with utf-8 pragma enabled -> whitespace, plus a warning about malformed utf-8 (the character from Regex B)
- Using some arbitrary text in the template and in Regex A, with Regex B as in the code, no utf-8 pragma -> the correct symbol
So although I have found a way to get symbols in my template to survive the encoding, I'm not at all satisfied with the solution for several reasons, one of which is that it isn't robust enough to deal with possible euro symbols in the data that is fed in to the template.
Can anyone offer suggestions on how I could convert from a typed euro symbol in utf-8 (as opposed to €) to a typed symbol in ISO-8859-1?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.