in reply to Re: Pulling out sections of an XMI file with XML::Twig
in thread Pulling out sections of an XMI file with XML::Twig

*click* ...a lightbulb turns on...

Thank you for the example code - that made all the difference! I didn't realize that $section was actually an XML::Twig::Elt object. Using your example as a starting point, I was able to get what I needed.

At first I thought I needed to parse the XML chunks into a data structure (such as that returned by XML::Simple), which I accomplished using

my $struct = $section->simplify( forcearray => 1 );
but I soon realized that was overkill for what I really needed - the value of the attributes for the class (etc) tags. The $elt->att( $attribute ) method did the trick. An example of the class handler from my working code is below.
sub uml_class { my ( $twig, $section ) = @_; print "data for class:\n"; print " name = ", $section->att( 'name' ), "\n"; print " xmi.id = ", $section->att( 'xmi.id' ), "\n"; my $subTwig = XML::Twig->new( twig_roots => { 'UML:Attribute' => \&uml_attr } +); # $subTwig->parse( $xml ); # original code (typo) $subTwig->parse( $section->sprint() ); }

Thanks again! I knew I was making it too hard. :-)

Update: corrected typo in the example code

Replies are listed 'Best First'.
Re^3: Pulling out sections of an XMI file with XML::Twig
by mirod (Canon) on Sep 28, 2006 at 09:42 UTC

    I must admit that I don't quite understand what you are doing, or even trying to do, but it looks like you are parsing things several time ( the call to parse in uml_class, but I fail to see what's exactly in $xml). It should not be necessary, you can set handlers at different levels of the tree (not twig_roots, but regular twig_handlers).

    If you could repost a complete example I might be able to help.

      Thank you for trying to help - I am still very new to XML::Twig and I am probably not doing things as efficiently as I could be. (Sorry for the typo with $xml in the above code - I fixed it.)

      I'm trying to extract certain attributes from a small set of tags in the XML document referenced above. Specifically, the UML:Class tags contain UML:Attribute tags, and I want to extract the name and xmi.id attributes from them. The general structure of this portion of the document looks like this:

      <UML:Class> <UML:Attribute></UML:Attribute> <UML:Attribute></UML:Attribute> <UML:Attribute></UML:Attribute> </UML:Class>
      A complete example is shown under the readmore.

      If I understand how XML::Twig works, a section of the tree is sent to a handler when the closing tag is reached. Therefore, for the simplified example above, each of the UML:Attribute sections will get parsed before the corresponding UML:Class section is parsed. I would like to parse the Class first (so I don't have to jump through hoops to associate the Attribute data with the Class data later), which is why the handler for UML:Attribute is located in the handler for UML:Class.

      I admit that it seems inefficient to call sprint and then parse. I took that snippet from GrandFather's example. Is there a better way to do it?

      I hope that clarifies what I'm trying to do. I'd appreciate any suggestions that you might have. Thanks.

        2 things can help you here: you can use start_tag_handlers, which are called after the start tag of the element has been parsed (and the element object has been created, empty at that point). This also means taht the UML:Attribute element will get completely parsed before the UML:Class, but when you're in their handler the opening tag of UML:Class has already been parsed. If you don't use twig_roots but regular twig_handlers the element exists, it's an ancestor of the UML:Attribute element, and it's attributes are already available. If space is a problem, you cant sprinkle purge call to taste.

        So in your case I would write something like (untested):

        use strict; use warnings; use Data::Dumper; use XML::Twig; my $twig = XML::Twig->new( start_tag_handlers => { 'UML:Class' => \&ulm_class, }, twig_handlers => { 'UML:Class' => sub { $_[0]->purge }, # purg +e at the end of each section, 'UML:Attribute' => \&uml_attr, } ); $twig->parsefile( 'testfile.xmi' ); sub uml_class { my ( $twig, $section ) = @_; print "data for class:\n"; print " name = ", $section->att( 'name' ), "\n"; print " xmi.id = ", $section->att( 'xmi.id' ), "\n"; } sub uml_attr { my ( $twig, $attr ) = @_; # if you need the class id, it's in $attr->parent( ''UML:Class')-> +attr( 'xmi.id') $attr->print; print Dumper( $struct ); # parse the block and extract the data elements $twig->purge; }

        Does it make sense? It probably doesn't matter that much if your files are small, but it feels better to parse only once each section.