joppei has asked for the wisdom of the Perl Monks concerning the following question:

Hi I'm trying to get some records of a xml file(s). The point of my code is to get all the node values printed out. As long as there arent multiple "twin" nodes its going good. I have the node <BaseItemDetails> that can appear multiple times on one record. Xml looks like this :
<Invoice> <InvoiceHeader> <InvoiceType></InvoiceType> <InvoiceStatus></InvoiceStatus> <InvoiceNumber></InvoiceNumber> <InvoiceDate></InvoiceDate> <Supplier> <Name> </Name> <OrgNumber></OrgNumber> <VatId></VatId> </Supplier> </InvoiceHeader> <InvoiceDetails> <BaseItemDetails> <Description></Description> <PerQuantity></PerQuantity> <QuantityInvoiced></QuantityInvoiced> <UnitOfMeasure></UnitOfMeasure> <LineItemGrossAmount></LineItemGrossAmount> <LineItemAmount></LineItemAmount> <SuppliersProductId></SuppliersProductId> <UnitPrice></UnitPrice> <StartDate></StartDate> <EndDate></EndDate> </BaseItemDetails> <BaseItemDetails> ------------------- -------------------- ----------------- </BaseItemDetails> </InvoiceDetails> </Invoice>
So far I have this code wich I want to change to make this work. :
foreach my $row ($xp->findnodes('Invoice')){ my $description= $row->find('InvoiceDetails/BaseItemDetails/Desc +ription')->string_value; ----------- ----------- print "$description "; }
How can i extract the <BaseItemDetails> for each record ? I Have no problem getting the data I need until I reach the multiple "simular" records. I can get the first "record" but clueless on how to get the next. Iv also managed to get all the <baseItemsDetails> printed out, but then i have no way of knowing wich record they belong to from the start. Wich is a big point. Whats the proper way to do this ? Do i need something different the xpath ? I`m pretty new to perl, so really any tips/advice is appriciated.

Replies are listed 'Best First'.
Re: Problems with multiple records in xml file , Xpath
by toolic (Bishop) on Mar 02, 2010 at 17:12 UTC
    Here is a simple way to do it using XML::Twig:
    use strict; use warnings; use XML::Twig; my $xmlStr = <<XML; <Invoice> <InvoiceHeader> <InvoiceType></InvoiceType> <InvoiceStatus></InvoiceStatus> <InvoiceNumber></InvoiceNumber> <InvoiceDate></InvoiceDate> <Supplier> <Name> </Name> <OrgNumber></OrgNumber> <VatId></VatId> </Supplier> </InvoiceHeader> <InvoiceDetails> <BaseItemDetails> <Description>desc1</Description> <PerQuantity></PerQuantity> <QuantityInvoiced></QuantityInvoiced> <UnitOfMeasure></UnitOfMeasure> <LineItemGrossAmount></LineItemGrossAmount> <LineItemAmount></LineItemAmount> <SuppliersProductId></SuppliersProductId> <UnitPrice></UnitPrice> <StartDate></StartDate> <EndDate></EndDate> </BaseItemDetails> <BaseItemDetails> <Description>desc2</Description> <PerQuantity></PerQuantity> <QuantityInvoiced></QuantityInvoiced> <UnitOfMeasure></UnitOfMeasure> <LineItemGrossAmount></LineItemGrossAmount> <LineItemAmount></LineItemAmount> <SuppliersProductId></SuppliersProductId> <UnitPrice></UnitPrice> <StartDate></StartDate> <EndDate></EndDate> </BaseItemDetails> </InvoiceDetails> </Invoice> XML my $twig= XML::Twig->new( twig_handlers => { 'BaseItemDetails/Description' => sub { print $_->text(), "\n" } } ); $twig->parse($xmlStr); __END__ desc1 desc2
Re: Problems with multiple records in xml file , Xpath
by stefbv (Priest) on Mar 02, 2010 at 17:22 UTC

    First we find all BaseItemDetails, then extract Description from each.

    use strict; use warnings; use XML::XPath; use XML::XPath::XMLParser; my $xp = XML::XPath->new(filename => 'invoice.xml'); my $nodeset = $xp->find('//Invoice/InvoiceDetails/BaseItemDetails'); foreach my $node ($nodeset->get_nodelist) { my $description = $node->find('Description')->string_value; print " DES: $description\n"; }

    Update: There is a nice tutorial here Using Perl XPath for converting Infopath XML files to Word Documents, also the source of my inspiration ;)

      Thx for the replies. I tried with your code and yes it gets all BaseItemid but now it gets printed 16 times , 1 for each Invoice record. Is there anyway to make the code only print the coresponding BaseItemids. And not run trough the whole file each time. Or do i have to edit somwhat in the xml to achive this. Ill try explain.. I want to print all the info from the xml, not just the BaseItemIds, so that it gets this output.

      Invoice1 (with all childs,all saved to string)

      Invoice2

      Invoice3

      This is how it works Now (wich is a mess, As i cant tell wich BaseItemId(s) Belong to wich Invoice record.

      Invoice1

      BaseItemIds (x 16)

      Invoice2

      BaseItemIds (x 16)

      ------------

      -------------

      Been looking at a endless amount of torturials & stuff, really banging my head here :/

        If you want to print all the info from the XML, than a better approach is to use the 'simplify' method of XML::Twig (or XML::Simple) to create a Perl data structure.

        use strict; use warnings; use XML::Twig; use Data::Dumper; my $twig = new XML::Twig(); my $config = $twig->parsefile('invoice2.xml')->simplify(); print Dumper( $config);

        With XML::XPath I see only a more tedious way (it's true that I don't have much experience with it.)

        Also I changed the xml a little:

        <Invoices> <Invoice> <InvoiceHeader> <InvoiceType>IT1</InvoiceType> <Supplier> <Name>Sup1</Name> <OrgNumber>Org1</OrgNumber> </Supplier> </InvoiceHeader> <InvoiceDetails> <BaseItemDetails> <Description>description11</Description> <PerQuantity></PerQuantity> </BaseItemDetails> <BaseItemDetails> <Description>description12</Description> <PerQuantity>101</PerQuantity> </BaseItemDetails> </InvoiceDetails> </Invoice> <Invoice> <InvoiceHeader> <InvoiceType>IT2</InvoiceType> <Supplier> <Name>Sup1</Name> <OrgNumber>Org2</OrgNumber> </Supplier> </InvoiceHeader> <InvoiceDetails> <BaseItemDetails> <Description>description21</Description> <PerQuantity></PerQuantity> </BaseItemDetails> <BaseItemDetails> <Description>description21</Description> <PerQuantity>101</PerQuantity> </BaseItemDetails> </InvoiceDetails> </Invoice> </Invoices>
        use strict; use warnings; use XML::XPath; use XML::XPath::XMLParser; my $xp = XML::XPath->new(filename => 'invoice2.xml'); my $nodeset = $xp->find('//Invoices/Invoice'); foreach my $node ($nodeset->get_nodelist) { my $it = $node->find('InvoiceHeader/InvoiceType')->string_value; print "IT: $it\n"; my $spn = $node->find('Supplier/Name')->string_value; print " SPN: $it\n"; my $des = $node->find('InvoiceDetails/BaseItemDetails/Description' +)->string_value; print " DES: $des\n"; # ... # You got the idea, yes? }

        Good coding, Stefan

        Update: 'TwigRoots' option is not needed in the XML::Twig example. You may want to use 'forcearray', 'keyattr' or other options to get a consistent data structure.

        Update2: Removed 'TwigRoots' option from the code.

        Within the loop for invoices, find('InvoiceDetails/BaseItemDetails') and loop though that list printing the descriptions.

        Other option would, to do some advertising, be to use XML::Rules. There's quite a few examples on Perlmonks.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.