Did you google? http://mail.gnome.org/archives/xml/2003-January/msg00038.html
> You pasted in the tree substrings which were not UTF8, check the input you >store in the tree for proper encoding. I assume you have >read and understood: > http://xmlsoft.org/encoding.html > >DanielTo your snippet, I added
and I gotuse Encode::Guess; die "guessing encoding ", guess_encoding($content, Encode->encodings(":all") );
UTF-32BE:Partial character at G:/Perl/lib/Encode/Guess.pm line 124. UCS-2LE:Partial character at G:/Perl/lib/Encode/Guess.pm line 124. UTF-32LE:Partial character at G:/Perl/lib/Encode/Guess.pm line 124. UTF-16BE:Partial character at G:/Perl/lib/Encode/Guess.pm line 124. UTF-16LE:Partial character at G:/Perl/lib/Encode/Guess.pm line 124. UTF-32:Unrecognised BOM 3c534352 at G:/Perl/lib/Encode/Guess.pm line 124.The "bad" response has a meta tag that says CHARSET=gb2312, so I do a search, and see that Encode::CN mentions it gb2312. I hope this helps.
update: Try clean_html(decode('euc-cn', $content ));, it will help (man this has got to say something about your debugging skills).
|
MJD says you can't just make shit up and expect the computer to know what you mean, retardo! I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests. ** The Third rule of perl club is a statement of fact: pod is sexy. |
In reply to Re: Crashing XML::LibXML by setting UserAgent
by PodMaster
in thread Crashing XML::LibXML by setting UserAgent
by hacker
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |