joealba has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a nasty parser to translate news stories from our legacy news story system (ATEX) to NITF XML files. The data that I get from ATEX is inconsistent (to put it mildly), so my parsing is a bit complicated.

In some cases, it is much easier for me to translate commands directly into NITF XML tags directly in the text. As a simple example, I translate e-mail and web addresses to their tag notations. However, XML::Writer automatically escapes special characters.

My question: XML::Writer uses IO for writing the actual XML file. Is it safe for me to go behind XML::Writer's back and use the IO object directly to write my data, tags and all? Note, by safe, I mean "XML::Writer won't puke,", not safe as in "My XML file is guaranteed to be well-formed."

Replies are listed 'Best First'.
Re: Using XML::Writer to create NITF files, but some tags exist in my data.
by mirod (Canon) on Dec 01, 2001 at 01:59 UTC

    I don't think that XML::Writer does much more than keeping track of the current open element, so as long as you don't play games opening an element "yourself" and then closing it using XML::Writer or vice-versa you should be clear.

Oops...
by joealba (Hermit) on Dec 01, 2001 at 01:32 UTC
    Note by "write my data" I meant "write only a few special chunks of my data".

    There wouldn't be much point in using XML::Writer if I planned to write all the data out through IO.. heheh.
Re: Using XML::Writer to create NITF files, but some tags exist in my data.
by Fastolfe (Vicar) on Dec 01, 2001 at 02:59 UTC
    I'm curious: What can't you do via XML::Writer that's forcing you to build the tag yourself?
      Well, parsing these nasty text files is quite difficult. So, it would be nice to make simple things easier. It's not a limitation of XML::Writer, it's more a limitation of my input (and my parser).

      Most of the important pieces of the parsed news story wind up stored in keys of a hash -- headline, byline, pubdate, etc. These are plain, unmarked text fields, which makes XML::Writer a simple tool for generating the XML file, tags and all.

      There's also a hash element which stores the text of the story. I use URI::Find and Email::Find to find and set links in this text field. My parser also has to add tags to denote other important areas of the text: sub-headlines, context graphs, etc.

      So my story text field winds up with a few embedded tags. When it comes time to print the paragraphs of my text to the XML file, XML::Writer escapes the gt/lt characters in my tags. I'm sure I could chunk through my text paragraphs and look for these tags -- then use XML::Writer to toss them in (if it's a valid NITF tag). But, I'm on a silly deadline to get this thing done, and I have no help (other than the monks!).

      So, I'd like to trust that the generated tags in my story text are valid and just toss them into the XML file without using XML::Writer's interface. It's a kludge.
        Well if its embedded HTML that you are worrying about then use HTML::Treebuilder or HTML::Tokeparser in conjunction with XML writer. Its not hard to do and would be more robust than accepting bad input data.

        Yves / DeMerphq
        --
        Have you registered your Name Space?