/me is not an XML expert, but so far, s/([^\x20\x21\x23-\x25\x28-\x3b\x3d\x3f-\x7e])/sprintf "&#%d;", ord $1/ge has been very helpful for me. (Although most of the time I am too lazy to look up which characters are valid, and just use s/(\W)/sprintf "&#%d;", ord $1/ge; (And even more often, I am too lazy to even write XML myself, and use a module for that))
I also thank whoever made headlines.rdf, because it is a wonderful idea. But that does not mean I like how it works (doesn't work), and even wonderful ideas need to be implemented correctly.
On PerlMonks, when reporting a bug, you get the strangest answers. I thought the open source community got ruder by the day, but apparently the monster that is called "WONTFIX" or "Patches welcome" has affected the closed source community as well.
The common advice is "Don't use regexes to parse XML, use an XML parser". Especially in this very monastery, this is said a lot. But when the XML is broken, of course (?), instead of asking for suggestions or maybe even rudely mentioning that patches are welcome, the people here document that it is broken and suggest parsing XML *without* a normal XML parser!
This is a very well known development strategy:
- Something breaks
- The broken behaviour is documented
- Everyone who expects things to works is wrong. After all, the bug is documented and therefor a feature.
- Documented behaviour never needs to be corrected
I disagree. If you don't know how to fix it, there are places where you can ask for help. Actually, that place is here, in our very own Seekers of Perl Wisdom. But I really cannot believe that whoever made this feed doesn't know how to fix it.
But yes, if the powers that be are unwilling to make the RDF be XML, it should indeed be removed and replaced by something that doesn't fool people into believing that it is XML.
In my opinion, a fix is appropriate, and not at all hard. Please tell the people that have access to the code.
I accept your offer to refund the purchase price. Thank you very much.
Juerd
# { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
P.S. Yes, XML is scary. I also like to avoid it. But when there is a standard, and you choose to use that standard, make sure you are compliant. If you don't want to use the standard, don't use its syntax.