For a project I'm working on I need a higher level API that can access/change sub trees in an XML document. I was thinking of providing the plugin API as a SAX filter so that it could be part of a larger sax machine (i.e. a pipeline that uses XML::STX or whatever to process the stream later).

The Taglib Plugin API - In brief I was thinking of registering subs with the taglib for each tag and passing in a subtree to each tag sub.
The returned structure from the tag sub is placed back into the stream as SAX events so they in turn can be parsed by the taglib.

For example:
package My::Taglib; use strict; use warnings; use base qw(XML::Filter::Foo); # name of filter yet to be decided. sub new { my $class = shift; my $self = bless ({}, $class); $self->register_ns(qw(http://www.iordy.com/ns/node)); $self->register_tags(qw(node hello)); } sub tag_node { # gets access to all children may do some processing based on them + # and then add in <foo:hello /> below <title>foo</title> } sub tag_hello { # replaces <foo:hello /> with <hello>world</hello> } 1; __DATA__ <data> <foo:node xmlns:foo="http://www.iordy.com/ns/foo"> <title>foo</title> </foo:node> </data>

I think I can cope with the buffering for sub documents etc. but I cant find a cpan module that implements a simple enough structure for the tag sub to modify. To clarify if I build a sub tree with XML::Simple I loose what was an attribute and what was a child (dont I?) when I create the SAX events from the result the tag sub returns and this will affect downstream handlers. Also it would be much faster if instead of buffering the events I build a sub tree as I go and deliver parts (or smaller sub trees as I hit the end of tags) until all open tags have hit end tags and we can remove the buffer.

I also thought about converting the subtree back to a string so that the taglib could use a parser of choice or simply just modify the text but this seems like a bad idea because you'll end up installing half the cpan just to meet dependencies if taglibs are implemented by different developers.

What I'd like to use/create is a simple perl structure like XML::Simple for the sub tree but one that maintaines the origional struture (namespaces, cdata, comments, children/attributes) that could be modified or just returned by the tag sub.

XML::Twig seems to have the sort of interface I want but it can't be used as a SAX filter only as a generator (at least in the version I saw). I looked at using XML::Twig for the whole project instead of SAX but I hope to use SAX so that I could use my taglibs in pipelines with other SAX tools like XML::STX.

This is where I am now and would like to ask for the wisdom of the monks as I'm not the worlds best programmer and I dont know anybody else to run this past. Is there a project I have missed that implements something like this? is it realistic or am I just asking to much?

In reply to Higher level API for XML plugins? by IOrdy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.