butlerdi has asked for the wisdom of the Perl Monks concerning the following question:

I have an input stream containing XML. 1 line dont want to use SAX or DOM to parse(Overkill). I tokenise the stuff and create a hashmap of the value pairs. But after processing the request I need to return the XML with a timestamp in a <TS></TS> field. Looking for something that would allow me to change the value for the tag I use a loop from my hashmap now to rebuild the XML but pretty ugly. Just cant get anything to work. TIA dave

Replies are listed 'Best First'.
Re: Regex et XML
by perrin (Chancellor) on Feb 25, 2004 at 21:47 UTC
    Why not just use XML::Simple like the rest of us? It's a lot less work than what you're doing now.
      Technically XML::Simple operates as a kind of (if not actually) a DOM parser (evidentally with some SAX or SAX-like options if you want those) -- but I agree with you. It's canon that parsing XML with regexes is a path to destruction.

      Perhaps the OP has severe memory concerns (i.e. over bloated giant XML file)? If so, writing a SAX parser only to remove pieces of XML would be painful and DOM would not be a good choice. Yet line-oriented parsing isn't going to work with XML anyway, so you are slurping -- hence memory issues again. Yep, it would be best to pick one of the other (DOM-ish or SAX-ish), despite the tradeoffs. Maintaining Yet-Another-XML-Manipulator would be quite painful. If the file is small by machine standards, absolutely, XML::Simple is the easiest way to go. Do it, and you can still think mostly in Perl!

        I agree. I think most projects that start off with homebrewed regex XML parsers tend to work fine at first. When the project starts to mature, you end up writing more conditionals into your parser. Over time you will find that you have just written something that is not-so-pretty, not-so-flexible, and not-so-supportable as XML::Simple. for most projects it makes sense to bail out early and use one of the XML parser/manipulators on cpan. Even when I have a very strong feeling my projects will not creap in terms of what I do with XML, I tend to use a cpan module for the manip and parse.


        -Waswas
Re: Regex et XML
by mirod (Canon) on Feb 26, 2004 at 09:12 UTC
    Idont want to use SAX or DOM to parse(Overkill)

    So instead of using something that parses XML and that makes life easy for you, you'd rather spend time writing something that parses some format that looks like XML (see On XML parsing to see why you probably won't cover the entire spec)? Does this seem like sound software engineering practice to you?

    Some very knowledgeable people indeed use regexps to parse XML. People who _really_ know what they are doing, why they are using regexps (they need the speed) and when to use them (the XML is in a known format). Before getting to that level, and needing the speed, I really think it's much safer , not to mention easier, to use a parser.

    You would have to show us an example of the data if you want help here. XML::Simple might or might not be what you need BTW, its output is often quite different from the input.

    This might help you, should you choose to go the overkill way ;--):

    #!/usr/bin/perl -w use strict; use XML::Twig; use YAML; my %hash; XML::Twig->new( twig_handlers => { elt => sub { $hash{$_->field('key')}= $_->field('v +alue'); $_->insert_new_elt( first_child => ts + => scalar localtime); $_[0]->flush; # flushes the doc to us +e less memory (might be overkill ;--) } }, pretty_print => 'record_c', ) ->parse( \*DATA) ->flush; # to flush the closing tag for doc print "\n\nHASH:\n", Dump \%hash; __DATA__ <doc> <elt><key>key1</key><value>value1</value></elt> <elt><key>key2</key><value>value2</value></elt> <elt><key>key3</key><value>value3</value></elt> <elt><key>key4</key><value>value4</value></elt> </doc>
      Thanx for your response. The reason for wanting to use regex is that the XML being parsed is of no interest to the program (a message handler) I am merly passing this from one system to another (occasionally connected P2P devices). The format and content never change and the footprint and memory capabilities of the device are very limited. Even nanoxml is a bit heavy here. All I am really looking to do is to replace a value "Update" with the cutrrent time or in some cases a session id.
Re: Regex et XML
by inman (Curate) on Feb 26, 2004 at 11:35 UTC
    You could use an XSLT on the original XML. The object of XSLT is to transform XML. In this particular example you want to perform a reasonably neutral transform and insert timestamp elements as you go.

    This code does a null transform based on a dataset.

    <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template>

    Assuming that you want to recurse over a number of top level nodes, you would use a construct like:

    <xsl:for-each select="MyNodes"> <xsl:sort select="@title"/><!--optional sort--> <xsl:apply-templates select="."/> <ts><xsl:value-of select="$timestamp"></ts> </xsl:for-each>

    Where $timestamp is a parameter that has been passed to the xslt engine by your Perl app.

Re: Regex et XML
by ambrus (Abbot) on Feb 25, 2004 at 22:31 UTC