in reply to Re: Using XML::Writer to create NITF files, but some tags exist in my data.
in thread Using XML::Writer to create NITF files, but some tags exist in my data.

Well, parsing these nasty text files is quite difficult. So, it would be nice to make simple things easier. It's not a limitation of XML::Writer, it's more a limitation of my input (and my parser).

Most of the important pieces of the parsed news story wind up stored in keys of a hash -- headline, byline, pubdate, etc. These are plain, unmarked text fields, which makes XML::Writer a simple tool for generating the XML file, tags and all.

There's also a hash element which stores the text of the story. I use URI::Find and Email::Find to find and set links in this text field. My parser also has to add tags to denote other important areas of the text: sub-headlines, context graphs, etc.

So my story text field winds up with a few embedded tags. When it comes time to print the paragraphs of my text to the XML file, XML::Writer escapes the gt/lt characters in my tags. I'm sure I could chunk through my text paragraphs and look for these tags -- then use XML::Writer to toss them in (if it's a valid NITF tag). But, I'm on a silly deadline to get this thing done, and I have no help (other than the monks!).

So, I'd like to trust that the generated tags in my story text are valid and just toss them into the XML file without using XML::Writer's interface. It's a kludge.
  • Comment on Re: Re: Using XML::Writer to create NITF files, but some tags exist in my data.

Replies are listed 'Best First'.
Re: Re: Re: Using XML::Writer to create NITF files, but some tags exist in my data.
by demerphq (Chancellor) on Dec 03, 2001 at 20:31 UTC
    Well if its embedded HTML that you are worrying about then use HTML::Treebuilder or HTML::Tokeparser in conjunction with XML writer. Its not hard to do and would be more robust than accepting bad input data.

    Yves / DeMerphq
    --
    Have you registered your Name Space?

      Thanks, but I'm worried about embedded, valid XML tags, not HTML tags (even though my example happens to be an HTML tag.. heheh).

      I just don't want XML::Writer to change things like this valid NITF XML tag: <a href="http://www.perlmonks.org"> to this: &lt;a href=&quot;http://www.perlmonks.org&quot;&gt;
        Well, perhaps I should have been more specific. The idea I had in mind was that you would use one of these (or their XML cousins) to process your various data. Callbacks or the equivelent would trigger the XML writer to spit out a duplicate tag. As I said shouldnt be too difficult to code. My experience with HTML::Treebuilder suggests that you could use XML::Treebuilder and the function look_down with a code ref to do a simple callback that uses XML::Writer to achieve the desired results (and also allow you to validate the types of tag you allow to embedded).

        A second approach (which I am hesitant to suggest, but you are an adult, even if an adult votebot :-) would be to use IO::Scalar as the type of OUTPUT parameter. That way you can do whatever you want to the string involved with little/no worry about what XML::Writer does behind doors.

        Still, I think I would go with the callback if only out of sheer paranoia and future flexibility.

        Yves / DeMerphq
        --
        Have you registered your Name Space?