in reply to Re^3: Pulling out sections of an XMI file with XML::Twig
in thread Pulling out sections of an XMI file with XML::Twig

Thank you for trying to help - I am still very new to XML::Twig and I am probably not doing things as efficiently as I could be. (Sorry for the typo with $xml in the above code - I fixed it.)

I'm trying to extract certain attributes from a small set of tags in the XML document referenced above. Specifically, the UML:Class tags contain UML:Attribute tags, and I want to extract the name and xmi.id attributes from them. The general structure of this portion of the document looks like this:

<UML:Class> <UML:Attribute></UML:Attribute> <UML:Attribute></UML:Attribute> <UML:Attribute></UML:Attribute> </UML:Class>
A complete example is shown under the readmore.
<UML:Class name="Gel2d" xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B +46C6D92" visibility="public" namespace="EAPK_AEE09178_6653_49a3_91E8_ +7CD6004767CA" isRoot="false" isLeaf="false" isAbstract="false" isActi +ve="false"> <UML:GeneralizableElement.generalization xmi.id="EAID_6AF3DB85_3 +AFE_4843_B8D3_0E61B46C6D92_fix_0"> <Foundation.Core.Generalization xmi.idref="EAID_8E74DB6E_6F4A_ +451b_A1E2_17F60ED264AE" xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B46C +6D92_fix_0_fix_0" /> </UML:GeneralizableElement.generalization> <UML:Classifier.feature xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E6 +1B46C6D92_fix_2"> <UML:Attribute name="loading" changeable="none" visibility="pu +blic" ownerScope="instance" targetScope="instance" xmi.id="EAID_6AF3D +B85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_0"> <UML:StructuralFeature.multiplicity xmi.id="EAID_6AF3DB85_3A +FE_4843_B8D3_0E61B46C6D92_fix_2_fix_0_fix_0"> <UML:Multiplicity xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E6 +1B46C6D92_fix_2_fix_0_fix_0_fix_0"> <UML:Multiplicity.range xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_0_fix_0_fix_0_fix_0"> <UML:MultiplicityRange lower="1" upper="1" xmi.id="EAI +D_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_0_fix_0_fix_0_fix_0_ +fix_0" /> </UML:Multiplicity.range> </UML:Multiplicity> </UML:StructuralFeature.multiplicity> <UML:Attribute.initialValue xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_0_fix_1"> <UML:Expression xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B +46C6D92_fix_2_fix_0_fix_1_fix_0" /> </UML:Attribute.initialValue> <UML:StructuralFeature.type xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_0_fix_2"> <Foundation.Core.Classifier xmi.idref="eaxmiid3" xmi.id="E +AID_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_0_fix_2_fix_0" /> </UML:StructuralFeature.type> </UML:Attribute> <UML:Attribute name="minPhRange" changeable="none" visibility= +"public" ownerScope="instance" targetScope="instance" xmi.id="EAID_6A +F3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_1"> <UML:StructuralFeature.multiplicity xmi.id="EAID_6AF3DB85_3A +FE_4843_B8D3_0E61B46C6D92_fix_2_fix_1_fix_0"> <UML:Multiplicity xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E6 +1B46C6D92_fix_2_fix_1_fix_0_fix_0"> <UML:Multiplicity.range xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_1_fix_0_fix_0_fix_0"> <UML:MultiplicityRange lower="1" upper="1" xmi.id="EAI +D_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_1_fix_0_fix_0_fix_0_ +fix_0" /> </UML:Multiplicity.range> </UML:Multiplicity> </UML:StructuralFeature.multiplicity> <UML:Attribute.initialValue xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_1_fix_1"> <UML:Expression xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B +46C6D92_fix_2_fix_1_fix_1_fix_0" /> </UML:Attribute.initialValue> <UML:StructuralFeature.type xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_1_fix_2"> <Foundation.Core.Classifier xmi.idref="eaxmiid3" xmi.id="E +AID_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_1_fix_2_fix_0" /> </UML:StructuralFeature.type> </UML:Attribute> <UML:Attribute name="maxPhRange" changeable="none" visibility= +"public" ownerScope="instance" targetScope="instance" xmi.id="EAID_6A +F3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_2"> <UML:StructuralFeature.multiplicity xmi.id="EAID_6AF3DB85_3A +FE_4843_B8D3_0E61B46C6D92_fix_2_fix_2_fix_0"> <UML:Multiplicity xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E6 +1B46C6D92_fix_2_fix_2_fix_0_fix_0"> <UML:Multiplicity.range xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_2_fix_0_fix_0_fix_0"> <UML:MultiplicityRange lower="1" upper="1" xmi.id="EAI +D_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_2_fix_0_fix_0_fix_0_ +fix_0" /> </UML:Multiplicity.range> </UML:Multiplicity> </UML:StructuralFeature.multiplicity> <UML:Attribute.initialValue xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_2_fix_1"> <UML:Expression xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B +46C6D92_fix_2_fix_2_fix_1_fix_0" /> </UML:Attribute.initialValue> <UML:StructuralFeature.type xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_2_fix_2"> <Foundation.Core.Classifier xmi.idref="eaxmiid3" xmi.id="E +AID_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_2_fix_2_fix_0" /> </UML:StructuralFeature.type> </UML:Attribute> <UML:Attribute name="firstDimDate" changeable="none" visibilit +y="public" ownerScope="instance" targetScope="instance" xmi.id="EAID_ +6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_3"> <UML:StructuralFeature.multiplicity xmi.id="EAID_6AF3DB85_3A +FE_4843_B8D3_0E61B46C6D92_fix_2_fix_3_fix_0"> <UML:Multiplicity xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E6 +1B46C6D92_fix_2_fix_3_fix_0_fix_0"> <UML:Multiplicity.range xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_3_fix_0_fix_0_fix_0"> <UML:MultiplicityRange lower="1" upper="1" xmi.id="EAI +D_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_3_fix_0_fix_0_fix_0_ +fix_0" /> </UML:Multiplicity.range> </UML:Multiplicity> </UML:StructuralFeature.multiplicity> <UML:Attribute.initialValue xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_3_fix_1"> <UML:Expression xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B +46C6D92_fix_2_fix_3_fix_1_fix_0" /> </UML:Attribute.initialValue> <UML:StructuralFeature.type xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_3_fix_2"> <Foundation.Core.Classifier xmi.idref="eaxmiid4" xmi.id="E +AID_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_3_fix_2_fix_0" /> </UML:StructuralFeature.type> </UML:Attribute> <UML:Attribute name="secondDimDate" changeable="none" visibili +ty="public" ownerScope="instance" targetScope="instance" xmi.id="EAID +_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_4"> <UML:StructuralFeature.multiplicity xmi.id="EAID_6AF3DB85_3A +FE_4843_B8D3_0E61B46C6D92_fix_2_fix_4_fix_0"> <UML:Multiplicity xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E6 +1B46C6D92_fix_2_fix_4_fix_0_fix_0"> <UML:Multiplicity.range xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_4_fix_0_fix_0_fix_0"> <UML:MultiplicityRange lower="1" upper="1" xmi.id="EAI +D_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_4_fix_0_fix_0_fix_0_ +fix_0" /> </UML:Multiplicity.range> </UML:Multiplicity> </UML:StructuralFeature.multiplicity> <UML:Attribute.initialValue xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_4_fix_1"> <UML:Expression xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B +46C6D92_fix_2_fix_4_fix_1_fix_0" /> </UML:Attribute.initialValue> <UML:StructuralFeature.type xmi.id="EAID_6AF3DB85_3AFE_4843_ +B8D3_0E61B46C6D92_fix_2_fix_4_fix_2"> <Foundation.Core.Classifier xmi.idref="eaxmiid4" xmi.id="E +AID_6AF3DB85_3AFE_4843_B8D3_0E61B46C6D92_fix_2_fix_4_fix_2_fix_0" /> </UML:StructuralFeature.type> </UML:Attribute> </UML:Classifier.feature> <UML:Namespace.ownedElement xmi.id="EAID_6AF3DB85_3AFE_4843_B8D3 +_0E61B46C6D92_fix_3"> <UML:Generalization child="EAID_6AF3DB85_3AFE_4843_B8D3_0E61B4 +6C6D92" parent="EAID_68C27150_2324_4f98_AECD_172A1251AB21" xmi.id="EA +ID_8E74DB6E_6F4A_451b_A1E2_17F60ED264AE" visibility="public" /> </UML:Namespace.ownedElement> </UML:Class>

If I understand how XML::Twig works, a section of the tree is sent to a handler when the closing tag is reached. Therefore, for the simplified example above, each of the UML:Attribute sections will get parsed before the corresponding UML:Class section is parsed. I would like to parse the Class first (so I don't have to jump through hoops to associate the Attribute data with the Class data later), which is why the handler for UML:Attribute is located in the handler for UML:Class.

I admit that it seems inefficient to call sprint and then parse. I took that snippet from GrandFather's example. Is there a better way to do it?

I hope that clarifies what I'm trying to do. I'd appreciate any suggestions that you might have. Thanks.

Replies are listed 'Best First'.
Re^5: Pulling out sections of an XMI file with XML::Twig
by mirod (Canon) on Sep 28, 2006 at 16:52 UTC

    2 things can help you here: you can use start_tag_handlers, which are called after the start tag of the element has been parsed (and the element object has been created, empty at that point). This also means taht the UML:Attribute element will get completely parsed before the UML:Class, but when you're in their handler the opening tag of UML:Class has already been parsed. If you don't use twig_roots but regular twig_handlers the element exists, it's an ancestor of the UML:Attribute element, and it's attributes are already available. If space is a problem, you cant sprinkle purge call to taste.

    So in your case I would write something like (untested):

    use strict; use warnings; use Data::Dumper; use XML::Twig; my $twig = XML::Twig->new( start_tag_handlers => { 'UML:Class' => \&ulm_class, }, twig_handlers => { 'UML:Class' => sub { $_[0]->purge }, # purg +e at the end of each section, 'UML:Attribute' => \&uml_attr, } ); $twig->parsefile( 'testfile.xmi' ); sub uml_class { my ( $twig, $section ) = @_; print "data for class:\n"; print " name = ", $section->att( 'name' ), "\n"; print " xmi.id = ", $section->att( 'xmi.id' ), "\n"; } sub uml_attr { my ( $twig, $attr ) = @_; # if you need the class id, it's in $attr->parent( ''UML:Class')-> +attr( 'xmi.id') $attr->print; print Dumper( $struct ); # parse the block and extract the data elements $twig->purge; }

    Does it make sense? It probably doesn't matter that much if your files are small, but it feels better to parse only once each section.

      Thank you for the example code and the detailed explanation. After cross-referencing them with the docs I think I have a better understanding of how to use the different types of handlers. The files are not that big (all should be < 10 MB), but I agree that there is no need to do extra parsing if there are more elegant ways to do it.

      Thanks again for your help, and for writing XML::Twig. I am thoroughly impressed with how little code I will have to write to accomplish this task. :-)