bobf has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to define handlers for XML::Twig that will allow me to parse out different sections of an XMI document (created by exporting a UML model from Enterprise Architect) and process them separately, but I can't seem to grok the syntax. A very simplified version of the document looks like this:

<XMI xmlns:UML="omg.org/UML1.3" xmi.version="1.1" timestamp="2006-08-0 +4 16:29:41"> <XMI.header> <XMI.documentation> <XMI.exporter>Enterprise Architect</XMI.exporter> <XMI.exporterVersion>4.1RR</XMI.exporterVersion> </XMI.documentation> </XMI.header> <XMI.content> <UML:Model name="EA Model" xmi.id="MX_EAID_..."> <UML:Namespace.ownedElement xmi.id="MX_EAID_..."> <!-- note the UML:Class tag on the next line --> <UML:Class name="EARootClass" xmi.id="EAID_..." /> <UML:Package name="Logical View" xmi.id="EAPK..."> <UML:Namespace.ownedElement xmi.id="EAPK..."> <UML:Package name="Logical Model" xmi.id="EAPK..."> <UML:Namespace.ownedElement xmi.id="EAPK..."> <!-- many UML:Package and UML:Namespace tags removed for brevity --> <!-- I want to pull out the following UML:Class block --> <UML:Class name="DataType" xmi.id="EAID... +"> <UML:Classifier.feature xmi.id="EAID..." +> <!-- I want to pull out the following UML:Attribute blocks --> <UML:Attribute name="dataTypeId"> </UML:Attribute> <UML:Attribute name="name"> </UML:Attribute> </UML:Classifier.feature> </UML:Class> <!-- I want to pull out the following UML:Association blocks --> <UML:Association xmi.id="EAID..."> </UML:Association> <UML:Association xmi.id="EAID..."> </UML:Association> <!-- many UML::Class and UML::Association blocks removed (I want these +, too) --> <!-- miscellaneous blocks removed --> </UML:Namespace.ownedElement> </UML:Package> </UML:Namespace.ownedElement> </UML:Package> <UML:DataType xmi.id="eaxmiid3" /> <UML:DataType xmi.id="eaxmiid1" /> </UML:Namespace.ownedElement> </UML:Model> <!-- I want to process each of the UML::TaggedValue elements --> <UML:TaggedValue tag="complexity" /> <UML:TaggedValue tag="ea_stype" /> <!-- many UML::TaggedValue tags removed --> </XMI.content> <XMI.extensions xmi.extender="Enterprise Architect 2.5" /> </XMI>

I tried a few things, but the closest I came to getting what I want is the following:

use strict; use warnings; use Data::Dumper; use XML::Twig; my $twig = XML::Twig->new( twig_roots => { 'UML:Class' => \&uml_class +} ); $twig->parsefile( 'testfile.xmi' ); sub uml_class { my ( $twig, $section ) = @_; my $elt = $twig->first_elt; my $struct = $elt->simplify( forcearray => 1 ); print Dumper( $struct ); # parse the block and extract the data elements $twig->purge; }

I can't seem to figure out how to grab a UML:Class block and send it to a parser - it always contains additional root elements and/or is missing child tags. I think I can figure out how to write separate parsers for the UML::Class, UML::Attribute, UML::Association, and UML::TaggedValue tags, but I need to isolate them first.

I must be making this too difficult. Could someone clue me in, please?

Thanks in advance!

Replies are listed 'Best First'.
Re: Pulling out sections of an XMI file with XML::Twig
by GrandFather (Saint) on Sep 27, 2006 at 21:35 UTC

    Is not $section what you want as a starting point?

    sub uml_class { my ( $twig, $section ) = @_; $section->print (); }

    Prints:

    <UML:Class name="EARootClass" xmi.id="EAID_..."/><UML:Class name="Data +Type" xmi.id="EAID..."><UML:Classifier.feature xmi.id="EAID..."> <!-- I want to pull out the following UML:Attribute blocks --> <UML:Attribute name="dataTypeId"></UML +:Attribute><UML:Attribute name="name"></UML:Attribute></UML:Classifie +r.feature></UML:Class>

    Update:

    Note the "Processing just parts of an XML document" section in the XML::Twig documentation that describes twig_roots and in particular the line my( $t, $elt)= @_; in the sample code in that section.

    Update:

    and to reparse the sub-element:

    sub uml_class { my ( $twig, $section ) = @_; my $xml = $section->sprint (); my $subTwig = XML::Twig->new ( twig_roots => { 'uml:attribute' => +\&uml_attr}); $subTwig->parse ($xml); } sub uml_attr { my ($twig, $elt) = @_; $elt->print (); print "\n"; }

    Prints:

    <uml:attribute name="datatypeid"></uml:attribute> <uml:attribute name="name"></uml:attribute>

    DWIM is Perl's answer to Gödel

      *click* ...a lightbulb turns on...

      Thank you for the example code - that made all the difference! I didn't realize that $section was actually an XML::Twig::Elt object. Using your example as a starting point, I was able to get what I needed.

      At first I thought I needed to parse the XML chunks into a data structure (such as that returned by XML::Simple), which I accomplished using

      my $struct = $section->simplify( forcearray => 1 );
      but I soon realized that was overkill for what I really needed - the value of the attributes for the class (etc) tags. The $elt->att( $attribute ) method did the trick. An example of the class handler from my working code is below.
      sub uml_class { my ( $twig, $section ) = @_; print "data for class:\n"; print " name = ", $section->att( 'name' ), "\n"; print " xmi.id = ", $section->att( 'xmi.id' ), "\n"; my $subTwig = XML::Twig->new( twig_roots => { 'UML:Attribute' => \&uml_attr } +); # $subTwig->parse( $xml ); # original code (typo) $subTwig->parse( $section->sprint() ); }

      Thanks again! I knew I was making it too hard. :-)

      Update: corrected typo in the example code

        I must admit that I don't quite understand what you are doing, or even trying to do, but it looks like you are parsing things several time ( the call to parse in uml_class, but I fail to see what's exactly in $xml). It should not be necessary, you can set handlers at different levels of the tree (not twig_roots, but regular twig_handlers).

        If you could repost a complete example I might be able to help.