thickice97 has asked for the wisdom of the Perl Monks concerning the following question:

<xs:complexType name="semt.002.001.01"> <xs:sequence> <xs:element name="PrvsRef" type="AdditionalReference2" minOccurs="0" m +axOccurs="unbounded"/> <xs:element name="RltdRef" type="AdditionalReference2" minOccurs="0" m +axOccurs="unbounded"/> <xs:element name="MsgPgntn" type="Pagination"/> <xs:element name="StmtGnlDtls" type="Statement3"/> <xs:element name="AcctDtls" type="SafekeepingAccount1"/> <xs:element name="BalForAcct" type="AggregateBalanceInformation1" minO +ccurs="0" maxOccurs="unbounded"/> <xs:element name="SubAcctDtls" type="SubAccountIdentification1" minOcc +urs="0" maxOccurs="unbounded"/> <xs:element name="TtlVals" type="TotalValueInPageAndStatement" minOccu +rs="0" maxOccurs="1"/> <xs:element name="Xtnsn" type="Extension1" minOccurs="0" maxOccurs="un +bounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="AdditionalReference2"> <xs:sequence> <xs:element name="Ref" type="Max35Text"/> <xs:element name="RefIssr" type="PartyIdentification1Choice" minOccurs +="0" maxOccurs="1"/> <xs:element name="MsgNm" type="Max35Text" minOccurs="0" maxOccurs="1"/ +> </xs:sequence> </xs:complexType> <xs:complexType name="PartyIdentification1Choice"> <xs:sequence> <xs:choice> <xs:element name="BICOrBEI" type="AnyBICIdentifier"/> <xs:element name="PrtryId" type="GenericIdentification1"/> <xs:element name="NmAndAdr" type="NameAndAddress2"/> </xs:choice> </xs:sequence> </xs:complexType> <xs:complexType name="GenericIdentification1"> <xs:sequence> <xs:element name="Id" type="Max35Text"/> <xs:element name="SchmeNm" type="Max35Text" minOccurs="0" maxOccurs="1 +"/> <xs:element name="Issr" type="Max35Text" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType>
This is my XML file. I need to parse this data and output it in this format
semt.002.001.01 semt.002.001.01.PrvsRef semt.002.001.01.PrvsRef.Ref semt.002.001.01 semt.002.001.01.PrvsRef semt.002.001.01.PrvsRef.RefIssr semt.002.001.01.PrvsRef.RefIssr.BICOrBEI
I tried using XML::Simple but it is SAX parser. But it wont give the output this format. What Perl module could I use for this approach and how like an example would be helpful. Please advise.

Replies are listed 'Best First'.
Re: perl parsing xml
by GrandFather (Saint) on Jan 22, 2008 at 23:49 UTC

    The "kitchen sink" module for manipulating XML is XML::Twig.

    A starting point using XML::Twig to solve you problem may be:

    use strict; use warnings; use XML::Twig; my $xml = <<XML; <xs:complexType name="semt.002.001.01"> <xs:sequence> <xs:element name="PrvsRef" type="AdditionalReference2" minOccurs="0" m +axOccurs="unbounded"/> <xs:element name="RltdRef" type="AdditionalReference2" minOccurs="0" m +axOccurs="unbounded"/> <xs:element name="MsgPgntn" type="Pagination"/> <xs:element name="StmtGnlDtls" type="Statement3"/> <xs:element name="AcctDtls" type="SafekeepingAccount1"/> <xs:element name="BalForAcct" type="AggregateBalanceInformation1" minO +ccurs="0" maxOccurs="unbounded"/> <xs:element name="SubAcctDtls" type="SubAccountIdentification1" minOcc +urs="0" maxOccurs="unbounded"/> <xs:element name="TtlVals" type="TotalValueInPageAndStatement" minOccu +rs="0" maxOccurs="1"/> <xs:element name="Xtnsn" type="Extension1" minOccurs="0" maxOccurs="un +bounded"/> </xs:sequence> </xs:complexType> XML my $twig = XML::Twig->new (twig_handlers => {'[@name]' => \&showNestin +g}); $twig->parse ($xml); sub showNesting { my @path; for my $elt (reverse $_->ancestors_or_self ()) { next unless defined $elt->att ('name'); push @path, $elt->att ('name'); } print join ('.', @path), "\n" if @path; }

    Prints:

    semt.002.001.01.PrvsRef semt.002.001.01.RltdRef semt.002.001.01.MsgPgntn semt.002.001.01.StmtGnlDtls semt.002.001.01.AcctDtls semt.002.001.01.BalForAcct semt.002.001.01.SubAcctDtls semt.002.001.01.TtlVals semt.002.001.01.Xtnsn semt.002.001.01

    which doesn't do exactly what you want (I can't see exactly what you want in any case), but should get you going.


    Perl is environmentally friendly - it saves trees
      I was actually using XML::Simple which was printing something like above. But the problem I want something like this
      semt.002.001.01.PrvsRef semt.002.001.01.RltdRef
      The output is to related to semt.002.001.01 which is the root level. It just an example output file .It has various types with associated names. Like name = PrvsRef type = AdditionalReference2, name = RltdRef, type = AdditionalReference2. Therefore semt.002.001.01.PrvsRef;semt.002.001.01.RltdRef . Now I need to parse the xml file for the type "AdditionalReference2" it again has three different names with associated types. There fore semt.002.001.01.PrvsRef.RefIssr;semt.002.001.01.Ref so on . Therefore I need to look for the types Ref, RefIssr,MsgNm in the xs:complex type and append the names to the rootlevel. It has to find the root level and then append the intermediate nodes available in the xml file.

        If I understand what you want then adding:

        my $type = $_->att ('type'); return unless defined $type && $type eq 'AdditionalReference2';

        following the my @path; line in showNesting does the trick and (for the sample data I used above) prints:

        semt.002.001.01.PrvsRef semt.002.001.01.RltdRef

        Perl is environmentally friendly - it saves trees