Re^3: Pulling out sections of an XMI file with XML::Twig

Replies are listed 'Best First'.
Re^4: Pulling out sections of an XMI file with XML::Twig by bobf (Monsignor) on Sep 28, 2006 at 16:00 UTC
Thank you for trying to help - I am still very new to XML::Twig and I am probably not doing things as efficiently as I could be. (Sorry for the typo with `$xml` in the above code - I fixed it.) I'm trying to extract certain attributes from a small set of tags in the XML document referenced above. Specifically, the UML:Class tags contain UML:Attribute tags, and I want to extract the name and xmi.id attributes from them. The general structure of this portion of the document looks like this: `<UML:Class> <UML:Attribute></UML:Attribute> <UML:Attribute></UML:Attribute> <UML:Attribute></UML:Attribute> </UML:Class>` [download] A complete example is shown under the readmore. Read more... (13 kB) If I understand how XML::Twig works, a section of the tree is sent to a handler when the closing tag is reached. Therefore, for the simplified example above, each of the UML:Attribute sections will get parsed before the corresponding UML:Class section is parsed. I would like to parse the Class first (so I don't have to jump through hoops to associate the Attribute data with the Class data later), which is why the handler for UML:Attribute is located in the handler for UML:Class. I admit that it seems inefficient to call `sprint` and then `parse`. I took that snippet from GrandFather's example. Is there a better way to do it? I hope that clarifies what I'm trying to do. I'd appreciate any suggestions that you might have. Thanks.	[reply] [d/l] [select]
Re^5: Pulling out sections of an XMI file with XML::Twig by mirod (Canon) on Sep 28, 2006 at 16:52 UTC
2 things can help you here: you can use `start_tag_handlers`, which are called after the start tag of the element has been parsed (and the element object has been created, empty at that point). This also means taht the UML:Attribute element will get completely parsed before the UML:Class, but when you're in their handler the opening tag of UML:Class has already been parsed. If you don't use `twig_roots` but regular `twig_handlers` the element exists, it's an ancestor of the UML:Attribute element, and it's attributes are already available. If space is a problem, you cant sprinkle `purge` call to taste. So in your case I would write something like (untested): use strict; use warnings; use Data::Dumper; use XML::Twig; my $twig = XML::Twig->new( start_tag_handlers => { 'UML:Class' => \&ulm_class, }, twig_handlers => { 'UML:Class' => sub { $_[0]->purge }, # purg +e at the end of each section, 'UML:Attribute' => \&uml_attr, } ); $twig->parsefile( 'testfile.xmi' ); sub uml_class { my ( $twig, $section ) = @_; print "data for class:\n"; print " name = ", $section->att( 'name' ), "\n"; print " xmi.id = ", $section->att( 'xmi.id' ), "\n"; } sub uml_attr { my ( $twig, $attr ) = @_; # if you need the class id, it's in $attr->parent( ''UML:Class')-> +attr( 'xmi.id') $attr->print; print Dumper( $struct ); # parse the block and extract the data elements $twig->purge; } [download] Does it make sense? It probably doesn't matter that much if your files are small, but it feels better to parse only once each section.	[reply] [d/l]
Re^6: Pulling out sections of an XMI file with XML::Twig by bobf (Monsignor) on Sep 28, 2006 at 17:32 UTC
Thank you for the example code and the detailed explanation. After cross-referencing them with the docs I think I have a better understanding of how to use the different types of handlers. The files are not that big (all should be < 10 MB), but I agree that there is no need to do extra parsing if there are more elegant ways to do it. Thanks again for your help, and for writing XML::Twig. I am thoroughly impressed with how little code I will have to write to accomplish this task. :-)	[reply]