epoptai has asked for the wisdom of the Perl Monks concerning the following question:
I've got a problem with the output of Perlmonks' chatterbox xml ticker. When a high-bit ascii character like 'á' is entered in CB the character is not encoded, it's transmitted with the XML stream in a way that causes XML::Simple to die (as expected when receiving bad xml). It would be best if 'legal' xml were generated by perlmonks, but that's not the case so it needs to be dealt with. I don't know much about this subject, and have been using the following code from jcwren to convert the problem characters into underscore:
That's very effective, but leaves something to be desired: the character behind the underscore. Since these characters can be detected and underscored, surely they can be detected and encoded properly? I've made many horribly broken attempts to encode these chrs but my lack of knowledge in this area always gets the last laugh.$xml =~ s/[\r\n\t]//g; $xml =~ tr/\x80-\xff/_/; $xml =~ tr/\x00-\x1f/_/;
Recently mirod posted Converting character encodings which includes a regex from XML::TiePYX that gets very close to doing the job, but it only encodes some of the characters, not all. It barfs on ¤ and probably others:
I seek an extended version of the XML::TiePYX regex to find and encode the full range of high-bit chrs specified in the first solution. I'd rather not use another module (XML parser or otherwise) for this task.# This is the regex from XML::TiePYX $xml =~ s{([\xc0-\xc3])(.)}{ my $hi = ord($1); my $lo = ord($2); chr(( +($hi & 0x03) <<6) | ($lo & 0x3F)) }ge;
thanks for your time - epoptai
--
Check out my Perlmonks Related Scripts like framechat,
reputer, and xNN.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex to encode entities in XML
by mirod (Canon) on Jun 11, 2001 at 10:22 UTC | |
|
Re: Unescaped entities in XML
by mr.nick (Chaplain) on Jun 11, 2001 at 02:32 UTC | |
by epoptai (Curate) on Jun 11, 2001 at 06:02 UTC | |
|
Re: Regex to encode entities in XML
by mirod (Canon) on Jun 11, 2001 at 12:32 UTC | |
|
Re: Regex to encode entities in XML
by ChemBoy (Priest) on Jun 12, 2001 at 00:04 UTC | |
|
Re: Regex to encode entities in XML
by Anonymous Monk on Sep 02, 2009 at 13:38 UTC |