bobf has asked for the wisdom of the Perl Monks concerning the following question:

I am still trying to wrap my head around XML::Twig and I could really use a nudge in the right direction. The docs just aren't making it through the fog and into my head today.

I am trying to parse an XMI file that has the following overall structure (greatly simplified, of course):

<XMI xmlns:UML="omg.org/UML1.3" xmi.version="1.1" timestamp="2007- +06-26..."> <XMI.content> <UML:Class name="Address" xmi.id="EAID_009DB5F9_3A9D_44d0_ +88BD_..."> <UML:ModelElement.taggedValue> <UML:TaggedValue tag="isSpecification" value="false" / +> <UML:TaggedValue tag="ea_stype" value="Class" /> </UML:ModelElement.taggedValue> <UML:Classifier.feature> <UML:Attribute name="city" changeable="none" > <UML:ModelElement.taggedValue> <UML:TaggedValue tag="description" value="City" /> <UML:TaggedValue tag="ea_guid" xmi.id="EAID_661614 +4D_..." /> </UML:ModelElement.taggedValue> </UML:Attribute> </UML:Classifier.feature> </UML:Class> <UML:TaggedValue tag="created" xmi.id="EAID_17DA671B_9257_42f1_8AA8_0EF5F305DBD0" modelElement="EAID_31EB028E_30B2_430a_9168_1010A2A7B851" / +> <UML:TaggedValue tag="ea_stype" xmi.id="EAID_FAF2F308_9EB4_4f83_90F6_BE8ABC10087C" modelElement="EAID_31EB028E_30B2_430a_9168_1010A2A7B851" / +> </XMI.content> </XMI>

The handlers I created are as follows:

my $twig = XML::Twig->new( start_tag_handlers => { 'UML:Class' => \&uml_class_start, 'UML:Package' => \&uml_package_start, }, twig_handlers => { # purge at the end of each class section 'UML:Class' => sub { $_[0]->purge }, 'UML:Attribute' => \&uml_attribute, 'UML:Generalization' => \&uml_generalization, 'UML:TaggedValue' => \&uml_taggedvalue, }, );

I need to get the value of different tags, but I suspect that I need different handlers for each, depending on the context. For example,

  1. I need to get the name and xmi.id for each class, which I do in uml_class_start() using
    my $xmi_id = $elt->att( 'xmi.id' ); my $name = $elt->att( 'name' );
    I don't need any of the UML:TaggedValue tags.
  2. I need the name and xmi.id for each attribute. Getting the name is done in uml_attribute() (which is easy, see the class example, above), but note that the xmi.id is one of the UML:TaggedValue tags (tag="ea_guid").
  3. At the end of the file there are a bunch of UML:TaggedValue tags that are not located within a Class or Attribute block. I need to grab the tag, xmi.id, value (not shown in the XMI snippet), and modelElement values for each (or, more specifically, for a given set of modelElement values). I do that in uml_taggedvalue() using the att() method, as in the class example.

How can I tell XML::Twig to ignore UML::TaggedValue tags in Classes, handle them one way in Attributes, and handle them a different way if they are not in either Classes or Attributes? (There are a couple of additional cases, too, but I can solve those by generalizing the solution to this question.)

Thanks very much, in advance.

Replies are listed 'Best First'.
Re: Creating context-specific handlers in XML::Twig
by grinder (Bishop) on Nov 08, 2007 at 23:52 UTC
    How can I tell XML::Twig to ignore UML::TaggedValue tags in Classes, handle them one way in Attributes, and handle them a different way if they are not in either Classes or Attributes?

    Ignoring them is just a question of handling them by doing nothing. So you really only set a twig_handler to look at UML:TaggedValue elements, and within that, use the path method to look at your ancestor chain to determine whether you unleash the Attributes treatment or else the not_a_Class treatment.

    Is that enough for you to get going or do you need a snippet?

    • another intruder with the mooring in the heart of the Perl

      Thanks for the reply and for the pointer to path(). I ran a few tests and I think I could make that work.

      Is this really the best way to approach this problem, though? I am not trying to be argumentative; I am just a little surprised because it was not what I was expecting.

      This approach requires that I write a lot of

      if( $child_of_some_element ) { # do this } elsif( $child_of_some_other_element ) { # do something else }
      in several different handlers, which seems pretty high maintenance - if the structure of a Class element (for example) changes I have to make sure all of the handlers are changed appropriately.

      I was hoping there would be a way to approach this that is a little more OO-like, where the child element wouldn't have to know what the parent element was in order to do the Right Thing. Ideally, I'd like to think about parsing this file in chunks: parse Class elements this way, parse Attribute elements that way, etc, and each chunk (think "object") could have its own TaggedValue handler (for example).

      Perhaps I am thinking about this in the wrong way and over-complicating matters, or my expectation for how to solve this problem is a Bad Idea.

      Thoughts?

Re: Creating context-specific handlers in XML::Twig
by mirod (Canon) on Nov 09, 2007 at 09:52 UTC

    First a generic comment on questions: if you want a somewhat complete solution, you need to give _all_ of the relevant data. If I can write a test case from the information you give, then I can start really working on it. If the test case is already written, than that's even better BTW.

    Here you mention that some of the data is missing, so I can hopefully give you "a nudge in the right direction", but no more, and I can't even really understand properly what you want to do if I know that I am missing some pieces of the puzzle.

    With that out of the way... let see if I can give you what you asked for ;--)

    The expression that trigger handlers are that: expressions. They are not limited to just tag names. Look for twig_handlers in the docs (badly formatted version here). So you can certainly use an expression like UML:Attribute/UML:ModelElement.taggedValue/UML:TaggedValue[@tag="ea_guid"] to trigger a handler just when you need it.

    In order to ignore the rest of the UML:TaggedValue elements, you can just not do anything with them, or if you want to actively prevent them from being included in the document tree, use the ignore method, but that's probably not worth the effort.

    Does that help?

      Thanks for the suggestions. I know the OP was a little sketchy on detail, but that was intentional. I hoped that the snippets of XML and code would give readers enough of a feel for how I was approaching the problem to understand my train(wreck) of thought, and to the extent that I did not achieve that I apologize.

      At this stage I am more interested in understanding the pros and cons to different approaches to the problem than the implementation details. That said, I would welcome any example code that you or any others are willing to provide, and to that end I am working on creating example input, output, and test code. I will update this thread when it is ready.

      Re: twig handlers being expressions rather than simple tag names. Thank you for pointing that out - I never realized that there was so much flexibility. I will definitely experiment with it.

      In summary, thanks for the reply. Between your comments and the example that GrandFather provided, I am tweaking my code to see if I can clean up the flow a bit. I will post an example soon.

Re: Creating context-specific handlers in XML::Twig
by GrandFather (Saint) on Nov 09, 2007 at 07:03 UTC

    Maybe what you want is to dispatch to a sub-handler based on context? Something like:

    use strict; use warnings; use XML::Twig; my $xml = <<'XML'; <XMI xmlns:UML="omg.org/UML1.3" xmi.version="1.1" timestamp="2007- +06-26..."> <XMI.content> <UML:Class name="Address" xmi.id="EAID_009DB5F9_3A9D_44d0_ +88BD_..."> <UML:ModelElement.taggedValue> <UML:TaggedValue tag="isSpecification" value="false" / +> <UML:TaggedValue tag="ea_stype" value="Class" /> </UML:ModelElement.taggedValue> <UML:Classifier.feature> <UML:Attribute name="city" changeable="none" > <UML:ModelElement.taggedValue> <UML:TaggedValue tag="description" value="City" /> <UML:TaggedValue tag="ea_guid" xmi.id="EAID_661614 +4D_..." /> </UML:ModelElement.taggedValue> </UML:Attribute> </UML:Classifier.feature> </UML:Class> <UML:TaggedValue tag="created" xmi.id="EAID_17DA671B_9257_42f1_8AA8_0EF5F305DBD0" modelElement="EAID_31EB028E_30B2_430a_9168_1010A2A7B851" / +> <UML:TaggedValue tag="ea_stype" xmi.id="EAID_FAF2F308_9EB4_4f83_90F6_BE8ABC10087C" modelElement="EAID_31EB028E_30B2_430a_9168_1010A2A7B851" / +> </XMI.content> </XMI> XML my $twig = XML::Twig->new( start_tag_handlers => { 'UML:Class' => \&uml_class_start, 'UML:Package' => \&uml_package_start, }, twig_handlers => { # purge at the end of each class section 'UML:Class' => sub { $_[0]->purge }, 'UML:Attribute' => \&uml_attribute, 'UML:Generalization' => \&uml_generalization, 'UML:TaggedValue' => \&uml_taggedvalue, }, ); $twig->parse ($xml); sub uml_class_start { my ($twig, $elt) = @_; my $xmi_id = $elt->att( 'xmi.id' ); my $name = $elt->att( 'name' ); } sub uml_package_start { } sub uml_taggedvalue { my ($twig, $elt) = @_; if ($elt->parent (qr/UML:Attribute/)){ uml_taggedvalue_attr (@_); } elsif ($elt->parent (qr/^UML:Class$/)) { print "Ignoring class taggedvalue\n"; } else { uml_taggedvalue_def (@_); } } sub uml_generalization { } sub uml_attribute { } sub uml_taggedvalue_attr { print "uml_taggedvalue_attr\n"; } sub uml_taggedvalue_def { print "uml_taggedvalue_def\n"; }

    Prints:

    Ignoring class taggedvalue Ignoring class taggedvalue uml_taggedvalue_attr uml_taggedvalue_attr uml_taggedvalue_def uml_taggedvalue_def

    Perl is environmentally friendly - it saves trees