in reply to XML::Parser chokes on UTF-8?
The last 'Ã' looks very suspicious: latin1 characters outside of the basic 0-127 range are stored on 2 bytes in UTF-8, they look like 'Ã?', a lone 'Ã' is certainly an error. My guess would be that a cut'n paste went wrong and the last character of the string was lost.
I used this bit of code to check the string BTW:
#!/usr/bin/perl -w use strict; use XML::Parser; use Text::Iconv; my $text= <DATA>; Text::Iconv->raise_error(1); my $converter= Text::Iconv->new( utf8 => 'latin1'); my $in_latin1= $converter->convert( $text); print "text in latin1: $in_latin1\n"; __DATA__ <title>Os cem melhores contos brasileiros do século /Italo Moriconi, + organização, introdução e refer</title>
|
|---|