in reply to Xpath value query

I wouldn't use XML::Parser but one of the modules more oriented towards XPath expressions, such as XML::Twig or XML::LibXML. If your data fits in memory (and will do so for some time coming), using XML::LibXML can be considerably faster than XML::Twig, but if there might be a chance that your data will not fit into memory as a DOM tree, XML::Twig is the absolutely best way to go for a parser.

Both modules will give you ways to issue XPath queries against the XML.

Replies are listed 'Best First'.
Re^2: Xpath value query
by SDwerner (Initiate) on Sep 17, 2015 at 18:54 UTC

    OK, so I looked at XML::Twig, and it looks like it should do what I need. My problem is, when I use the XPaths I'm being provided, I get errors.. for example running this against my example XML file.

    #!/usr/opt/perl5/bin/perl -sw use XML::Twig; my $twig = XML::Twig->new(); $twig->parsefile($xmlfile); my $root = $twig->root; foreach my $i ($root->get_xpath('substring(//OLifE/Party[@id=//OLifE/R +elation[RelationRoleCode/@tc=8]/@RelatedObjectID]/Person/First Name, 1, 30)')) { print $i->{att}->{InvType}; foreach my $j ($i->get_xpath('../InvCounts/InvCount')) { print " " . $j->{att}->{Count}; } print "\n"; }

    the result is this

    ./twigs.pl -xmlfile=a2b9f375-51fe-41a1-86ab-069561517890.xml error in xpath expression substring(//OLifE/Party[@id=//OLifE/Relation +[RelationRoleCode/@tc=8]/@RelatedObjectID]/Person/FirstName, 1, 30) a +round substring(//OLifE/Party[@id=//OLifE/Relation[RelationRoleCode/@ +tc=8]/@RelatedObjectID]/Person/FirstName, 1, 30) at ./twigs.pl line 7

    So, I guess I don't understand how I would use the XPaths I've been provided. Do I have to find the attributes they referecne first, and then substitute that into the related entry in the XPath?

      You've been given

      //Party[@id=//Relation[child::RelationRoleCode[@tc='37']]/@RelatedObje +ctID]/Producer/CarrierAppointment/CompanyProducerID

      as an XPath expression. Why do you think you need to wrap substring(...) around that? What is your goal here?

      Maybe start out with the simple XPath expressions, find out what XML::Twig returns and then look at how you get from a node to the values you really want.

        The code I used with the substring was actually the third example of XPath provided in my original post, and as provided to me by the vendor included the substring function. I am assuming based on the complete XPath provided "substring(//OLifE/Party[@id=//OLifE/RelationRelationRoleCode/@tc=8/@RelatedObjectID]/Person/FirstName, 1, 30)" they are looking in this instance for the first 30 characters of a person's first name.

        Now, I took your suggestion to heart, and using the first XPath "//Party[@id=//Relation[child::RelationRoleCode@tc='37']/@RelatedObjectID]/Producer/CarrierAppointment/CompanyProducerID " and resolving all the subqueryies I get it to run, however, not return any values.. although I have to admit, looking at the sample XML file, I'm not sure I totally understand where the data is supposed to come from, so I'm not sure I'm doing it correctly.

        here is the modified script with the expanded XPath as I believe it should be, again I'm not sure I did it correctly since I'm very new to XML.

        #!/usr/opt/perl5/bin/perl -sw use XML::Twig; my $twig = XML::Twig->new(); $twig->parsefile($xmlfile); my $root = $twig->root; foreach my $i ($root->get_xpath('//Party/Relation_daf1bb84-658a-4bad-a +ff7-86d5fc755101/Party_2d205fbf-cadd-4475-9d51-a8a8aea1c625/') ) { print $i->{att}->{InvType}; foreach my $j ($i->get_xpath('../Producer/CarrierAppointment/Compan +yProducerID')) { print " " . $j->{att}->{Count}; } print "\n"; }

        when I run it I get nothing. Which at least isn't an error.

        any ideas would be appreciated.

      I don't know what you are trying to extract but here's a simple example to get you started

      #!perl use strict; use XML::Twig; my $twig = XML::Twig->new(); $twig->parsefile('txlife.xml'); my $root = $twig->root; my @nodes1 = $root->get_xpath('//Holding'); for my $hold (@nodes1){ print "\nId = ".$hold->att('id')."\n"; my @nodes2 = $hold->get_xpath('Policy/KeyedValue/KeyValue'); for my $_ (@nodes2){ print $_->prev_sibling_text." = "; print $_->text."\n"; } }
      Ouput should be
      Id = Holding_aa2b0594-77d6-4264-b277-0218e852cb36 AccountType = Individual CheckIndicator = No 1035ExchangeIncluded = No SponsorName = InsCompany Id = Holding_f2cfb5bd-6009-4bf2-95a0-686cb69a3b7a OldProductType = 401K
      poj

        I had to tweak it just a bit... for some reason I was getting the error Can't use global $_ in "my" at ./twig3.pl line 15, near "my $_ " and I added passing the filename as a command line variable.

        #!/usr/opt/perl5/bin/perl -sw # use strict; use XML::Twig; # my $xmlfile=@ARGV; my $twig = XML::Twig->new(); $twig->parsefile($xmlfile); my $root = $twig->root; my @nodes1 = $root->get_xpath('//Holding'); for my $hold (@nodes1){ print "\nId = ".$hold->att('id')."\n"; my @nodes2 = $hold->get_xpath('Policy/KeyedValue/KeyValue'); for $_ (@nodes2){ print $_->prev_sibling_text." = "; print $_->text."\n"; } } ./twig3.pl -xmlfile=a2b9f375-51fe-41a1-86ab-069561517890.xml Id = Holding_a344e55a-1471-4c90-aa94-481b434a5b12 AccountType = UTMA_UGMA CheckIndicator = No 1035ExchangeIncluded = No SponsorName = Transamerica Id = Holding_8f6e6000-6cc5-4e7a-b5e8-3acd2ee03468 OldProductType = 401K

        this will get me on my way! Thank you, you help is much appreciated. Also a note, I've discovered during this adventure that the file is an ACORD standard file https://www.acord.org/standards/downloads/Pages/PCSPublic1.aspx for what its worth. Now I just need to figure out how to dissect the sub-attributes and insert them into the XPath.

        Again, thank you.

Re^2: Xpath value query
by choroba (Cardinal) on Sep 18, 2015 at 12:41 UTC
    For massively large XML files, you can usually use XML::LibXML::Reader.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ