Thank you for trying to help - I am still very new to XML::Twig and I am probably not doing things as efficiently as I could be. (Sorry for the typo with $xml in the above code - I fixed it.)
I'm trying to extract certain attributes from a small set of tags in the XML document referenced above. Specifically, the UML:Class tags contain UML:Attribute tags, and I want to extract the name and xmi.id attributes from them. The general structure of this portion of the document looks like this:
<UML:Class>
<UML:Attribute></UML:Attribute>
<UML:Attribute></UML:Attribute>
<UML:Attribute></UML:Attribute>
</UML:Class>
A complete example is shown under the readmore.
If I understand how XML::Twig works, a section of the tree is sent to a handler when the closing tag is reached. Therefore, for the simplified example above, each of the UML:Attribute sections will get parsed before the corresponding UML:Class section is parsed. I would like to parse the Class first (so I don't have to jump through hoops to associate the Attribute data with the Class data later), which is why the handler for UML:Attribute is located in the handler for UML:Class.
I admit that it seems inefficient to call sprint and then parse. I took that snippet from GrandFather's example. Is there a better way to do it?
I hope that clarifies what I'm trying to do. I'd appreciate any suggestions that you might have. Thanks.
| [reply] [d/l] [select] |
2 things can help you here: you can use start_tag_handlers, which are called after the start tag of the element has been parsed (and the element object has been created, empty at that point). This also means taht the UML:Attribute element will get completely parsed before the UML:Class, but when you're in their handler the opening tag of UML:Class has already been parsed. If you don't use twig_roots but regular twig_handlers the element exists, it's an ancestor of the UML:Attribute element, and it's attributes are already available. If space is a problem, you cant sprinkle purge call to taste.
So in your case I would write something like (untested):
use strict;
use warnings;
use Data::Dumper;
use XML::Twig;
my $twig = XML::Twig->new(
start_tag_handlers => { 'UML:Class' => \&ulm_class, },
twig_handlers => { 'UML:Class' => sub { $_[0]->purge }, # purg
+e at the end of each section,
'UML:Attribute' => \¨_attr,
} );
$twig->parsefile( 'testfile.xmi' );
sub uml_class
{
my ( $twig, $section ) = @_;
print "data for class:\n";
print " name = ", $section->att( 'name' ), "\n";
print " xmi.id = ", $section->att( 'xmi.id' ), "\n";
}
sub uml_attr
{
my ( $twig, $attr ) = @_;
# if you need the class id, it's in $attr->parent( ''UML:Class')->
+attr( 'xmi.id')
$attr->print;
print Dumper( $struct );
# parse the block and extract the data elements
$twig->purge;
}
Does it make sense? It probably doesn't matter that much if your files are small, but it feels better to parse only once each section. | [reply] [d/l] |
Thank you for the example code and the detailed explanation. After cross-referencing them with the docs I think I have a better understanding of how to use the different types of handlers. The files are not that big (all should be < 10 MB), but I agree that there is no need to do extra parsing if there are more elegant ways to do it.
Thanks again for your help, and for writing XML::Twig. I am thoroughly impressed with how little code I will have to write to accomplish this task. :-)
| [reply] |