garliqua has asked for the wisdom of the Perl Monks concerning the following question:
Wise monks, I've got one for you.
I'm developing a content management system where the data are all stored in XML files. Everything is groovy with one exception: if a user tries to submit a web page with control characters (such as \x92 for single right quote) in it, then the XML Parser (XML::Simple, which uses XML::Parser) coughs, sputters and dies.
So, what I'd like to do is have a single regexp just go through and change all the \xNN characters to their XHTML entity equivalent. For instance, the single character \x92 would become ’.
My problem is that I can't seem to get something along these lines to work:
s/\x(\d+)/'&#' . hex($1) . ';'/ge
I think I know why this doesn't work (because the \d+ is searching for multiple digit characters whereas what I want is to find the single character specified by an expression like \x92 or \x93).
If I can avoid doing it, I'd rather not do something like:
for (127 .. 255) { my $regexp = "s/\\x" . sprintf("%lx", $_) . "/&#$_;/g"; eval "$block =~ $regexp;"; }
Perhaps there is a solution involving pack(), though it hasn't occurred to me yet.
Any ideas?
Thanks.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Control Characters (\xNN) in HTML
by blackmateria (Chaplain) on Oct 18, 2001 at 20:52 UTC | |
by tommyw (Hermit) on Oct 18, 2001 at 22:24 UTC | |
by garliqua (Novice) on Oct 21, 2001 at 02:51 UTC | |
|
Re: Control Characters (\xNN) in HTML
by tommyw (Hermit) on Oct 18, 2001 at 20:38 UTC | |
by garliqua (Novice) on Oct 18, 2001 at 21:03 UTC | |
|
Re: Control Characters (\xNN) in HTML
by scain (Curate) on Oct 18, 2001 at 20:42 UTC |