in reply to ':encoding(UTF-8)' corrupts strings from XML::LibXML which doesn't return unicode strings ?
It may be because the Microsoft website isn't indicating the document's UTF-8-ness in the HTTP headers. If you do the HTTP fetch outside XML::LibXML (using LWP::Simple), all is OK...
use LWP::Simple 'get'; use XML::LibXML; binmode STDOUT, ':encoding(UTF-8)'; my $str = XML::LibXML->new( qw/ recover 2 / )->load_html( string => get q{http://msdn.microsoft.com/en-us/library/aa664812(v +=vs.71).aspx}, )->find( q{/html/body/div/div[2]/div[2]/div[3]/div[3]/dl[15]/dd[29] } )->get_node(0)->textContent; print $str;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: ':encoding(UTF-8)' corrupts strings from XML::LibXML which doesn't return unicode strings ?
by Anonymous Monk on Feb 28, 2013 at 23:39 UTC |