Well, parsing these nasty text files is quite difficult. So, it would be nice to make simple things easier. It's not a limitation of XML::Writer, it's more a limitation of my input (and my parser).
Most of the important pieces of the parsed news story wind up stored in keys of a hash -- headline, byline, pubdate, etc. These are plain, unmarked text fields, which makes XML::Writer a simple tool for generating the XML file, tags and all.
There's also a hash element which stores the text of the story. I use URI::Find and Email::Find to find and set links in this text field. My parser also has to add tags to denote other important areas of the text: sub-headlines, context graphs, etc.
So my story text field winds up with a few embedded tags. When it comes time to print the paragraphs of my text to the XML file, XML::Writer escapes the gt/lt characters in my tags. I'm sure I could chunk through my text paragraphs and look for these tags -- then use XML::Writer to toss them in (if it's a valid NITF tag). But, I'm on a silly deadline to get this thing done, and I have no help (other than the monks!).
So, I'd like to trust that the generated tags in my story text are valid and just toss them into the XML file without using XML::Writer's interface. It's a kludge.
| [reply] |
| [reply] |
Thanks, but I'm worried about embedded, valid XML tags, not HTML tags (even though my example happens to be an HTML tag.. heheh).
I just don't want XML::Writer to change things like this valid NITF XML tag:
<a href="http://www.perlmonks.org">
to this:
<a href="http://www.perlmonks.org">
| [reply] [d/l] [select] |