omacneil has asked for the wisdom of the Perl Monks concerning the following question:
We have a database dump that is (mostly) in iso-8859-01 or more likely in windows 1252. The database was populated by a web form that encouraged browsers to give us text in these charsets because it's head section included:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1 +">
...and Internet Exploder interprets iso-8859-01 as license to give you Windows 1252
We are converting to utf-8 , becasue it is clearer and better supports non-English characters
The somewhat simplified code is
use Encode qw(from_to); binmode DATA, ':utf8'; my $original=<DATA>; my $converted=$original; from_to($converted,'iso-8859-01','utf-8'); from_to($converted,'utf-8','iso-8859-01'); print $converted eq $original?'round trip ok':'changed'; __DATA__ some chars that in reality aren't all low ascii
Our problem is that our utf-8 output doesn't show up correctly in the terminal. As near as we can tell from the locale command and the Terminal->Set Character encoding menu item in gnome-terminal
For example, some of our converted output in utf8 contains a bunch of 0xC2 and 0xC3 (194 & 195) chars
perl -e 'binmode STDOUT,":utf8"; print chr(0xC2),"\n";'
...Gives a LATIN CAPITAL A WITH CIRCUMFLEX (according to gnome-character-map), which is not in the input.
maybe we don't know what char set the input is in?
UPDATE: set binmode per Anonymous Friend
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: display of utf8
by moritz (Cardinal) on Aug 12, 2009 at 06:32 UTC | |
|
Re: display of utf8
by Anonymous Monk on Aug 12, 2009 at 04:53 UTC | |
|
Re: display of utf8
by ikegami (Patriarch) on Aug 12, 2009 at 15:10 UTC | |
|
Re: display of utf8
by grantm (Parson) on Aug 14, 2009 at 00:19 UTC |