in reply to Re: HTML parsing module handles known and unknown encoding
in thread HTML parsing module handles known and unknown encoding
That works fine for XML since XML must specify its encoding within the document (binary format), but not so much with HTML where the encoding is specified outside of the document (text format).
I don't see any way of specifying the encoding of an HTML document, which is weird because XML::LibXML supposedly handles HTML.
XML::LibXML handles UTF-16 just fine.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: HTML parsing module handles known and unknown encoding
by ambrus (Abbot) on Nov 17, 2011 at 08:50 UTC | |
by ikegami (Patriarch) on Nov 17, 2011 at 09:56 UTC | |
|
Re^3: HTML parsing module handles known and unknown encoding
by grantm (Parson) on Nov 16, 2011 at 21:07 UTC | |
by ikegami (Patriarch) on Nov 16, 2011 at 22:19 UTC | |
by grantm (Parson) on Nov 17, 2011 at 23:42 UTC | |
|
Re^3: HTML parsing module handles known and unknown encoding
by Corion (Patriarch) on Nov 16, 2011 at 19:07 UTC |