saintmike has asked for the wisdom of the Perl Monks concerning the following question:
To transform an XML entity like
into the corresponding utf-8-encoded Unicode character, the following substitution can be used, given that the string this is performed on is a Unicode string:ü
Now, instead of having to write this snippet down over and over again, I'd prefer something likes/&#x([0-9a-f]+);/chr(hex($1))/ige
which I use all the time, not because URL-unescaping is terribly complicated, but because for such a common operation there ought to be a standard procedure.use URI::Escape; $str = uri_unescape($safe);
So ... is there a module on CPAN that does something similar? If not, I'll be happy to put one up there.
By the way, XML::DOM provides a function called XmlUtf8Encode which does a lot more than calling chr(), but I guess that's because it tries to cope with older perl releases that didn't support Unicode well. Any insight on this would be appreciated as well.
(Hex-entity corrected, thanks eserte.)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Decode XML &#xxxx; entities
by eserte (Deacon) on Dec 04, 2007 at 20:12 UTC | |
by saintmike (Vicar) on Dec 04, 2007 at 21:07 UTC | |
|
Re: Decode XML &#xxxx; entities
by moritz (Cardinal) on Dec 04, 2007 at 18:20 UTC | |
by saintmike (Vicar) on Dec 04, 2007 at 18:33 UTC | |
by moritz (Cardinal) on Dec 04, 2007 at 18:51 UTC |