in reply to Re^4: Mugged by UTF8, this CANNOT be right
in thread Mugged by UTF8, this CANNOT be right
So do you see how I could easily be lulled into thinking I shouldn't have to decode strings of text (not binary data!)
Good, cause you must not decode strings of text. Decoding is the process of getting text from "binary data".
The statement $row->{UnicodeCharacter} = decode_utf8($row->{UnicodeCharacter}); is a no-op.
ug. The Unicode bug :(
use strict; use warnings; use Test::More tests => 4; use Encode qw( encode_utf8 decode_utf8 ); my $x1 = chr(0xE9); my $y1 = encode_utf8($x1); my $z = encode_utf8($y1); my $y2 = decode_utf8($z); my $x2 = decode_utf8($y2); is($y2, $y1); is($x2, $x1); # Fails isnt($x1, $y1); isnt($y1, $z); 1;
1..4 ok 1 not ok 2 # Failed test at a.pl line 15. # got: 'é' # expected: 'é' ok 3 ok 4 # Looks like you failed 1 test of 4.
This is better:
use strict; use warnings; use Test::More tests => 4; my $x1 = chr(0xE9); utf8::encode( my $y1 = $x1 ); utf8::encode( my $z = $y1 ); utf8::decode( my $y2 = $z ); utf8::decode( my $x2 = $y2 ); is($y2, $y1); is($x2, $x1); isnt($x1, $y1); isnt($y1, $z); 1;
1..4 ok 1 ok 2 ok 3 ok 4
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Mugged by UTF8, this CANNOT be right
by tosh (Scribe) on Jan 27, 2011 at 09:39 UTC | |
by ikegami (Patriarch) on Jan 27, 2011 at 18:17 UTC |