To transform an XML entity like
into the corresponding utf-8-encoded Unicode character, the following substitution can be used, given that the string this is performed on is a Unicode string:ü
Now, instead of having to write this snippet down over and over again, I'd prefer something likes/&#x([0-9a-f]+);/chr(hex($1))/ige
which I use all the time, not because URL-unescaping is terribly complicated, but because for such a common operation there ought to be a standard procedure.use URI::Escape; $str = uri_unescape($safe);
So ... is there a module on CPAN that does something similar? If not, I'll be happy to put one up there.
By the way, XML::DOM provides a function called XmlUtf8Encode which does a lot more than calling chr(), but I guess that's because it tries to cope with older perl releases that didn't support Unicode well. Any insight on this would be appreciated as well.
(Hex-entity corrected, thanks eserte.)
In reply to Decode XML &#xxxx; entities by saintmike
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |