CA_Tallguy has asked for the wisdom of the Perl Monks concerning the following question:

Oh mighty Perl Monks, please assist me through this terribly frustrating (yet likely very simple to solve) roadblock. I'm trying to access data in a very very large XML file to process and selectively load into database. I am able to print out the data I am after but can't seem to assign to a variable that will persist. I need to collect all the fields for the database row in order to format my insert. Thanks in advance for any wisdom you might share with me.

#!/usr/bin/perl use XML::LibXML::Reader; my $reader = XML::LibXML::Reader->new(location => "parts.xml") or die "cannot read file.xml\n"; while($reader->read) { ## VARIABLE ASSIGNMENT DOES NOT PERSIST $price = $reader->readInnerXml if $reader->localName eq 'price'; ## THESE PRINT EXPECTED VALUES print $reader->readInnerXml if $reader->localName eq 'price'; print $reader->readInnerXml if $reader->localName eq 'url'; print $reader->readInnerXml if $reader->localName eq 'imageurl'; print $reader->readInnerXml if $reader->localName eq 'name'; }

The XML looks like this....

<product> <name>5 Spoke Wheel</name> <description>Reconditioned OEM</description> <price>123.45</price> <url>http://www.foo.com</buyurl> <imageurl>http://www.foo.com/foo.jpg</imageurl>] </product>

Replies are listed 'Best First'.
Re: Assigning variables and persistence using Lib::LibXML::Reader
by AppleFritter (Vicar) on Apr 02, 2017 at 21:38 UTC

    By "persist", do you mean that the data read from the file will be available after the while loop's done? If so, declare your variable(s) before the loop, outside the loop's body.

    EDIT: OK, scratch that, I misunderstood your question. (It's a Sunday night, that's my excuse and I'm sticking to it.) Looking at your sample XML snippet (not well-formed, BTW), it seems that $reader->localName equals "price" twice, when the price tag gets opened and when it gets closed. So $price gets set correctly, but then overwritten again.

    The easiest (quickest, dirtiest) way to deal with that is to use ||= or //=:

    while($reader->read) { $price //= $reader->readInnerXml if $reader->localName eq 'price'; } print $price;

    This will only assign to $price if $price is false (||=) or undefined (//=), and leave it be otherwise.

      If so, declare your variable(s) before the loop, outside the loop's body.

      If he doesn' use strict, the undeclared variable $price is a package global and will persist after the loop, carrying the value of the last assignment:

      # look ma, no strict for (1..4) { $price = $_; } print $price,$/ __END__ 4
      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Assigning variables and persistence using Lib::LibXML::Reader
by Anonymous Monk on Apr 02, 2017 at 21:53 UTC
    Put some better debug stuff in the loop to see what is happening: $reader->read also reads closing tags.
    use Data::Dump; my $price; while($reader->read) { next unless $reader->nodeType==XML_READER_TYPE_ELEMENT; dd $reader->localName, $reader->readInnerXml; $price = $reader->readInnerXml if $reader->localName eq 'price'; } dd "price", $price;
    Output:
    ( "product", "\n<name>5 Spoke Wheel</name>\n<description>Reconditioned OEM</descr +iption>\n<price>123.45</price>\n<url>http://www.foo.com</url>\n<image +url>http://www.foo.com/foo.jpg</imageurl>]\n", ) ("name", "5 Spoke Wheel") ("description", "Reconditioned OEM") ("price", 123.45) ("url", "http://www.foo.com") ("imageurl", "http://www.foo.com/foo.jpg") ("price", 123.45)
    BTW your XML is broken. </buyurl> should be </url>.
Re: Assigning variables and persistence using Lib::LibXML::Reader
by Anonymous Monk on Apr 03, 2017 at 03:18 UTC

    Hi,

    Use the twig good buddy, no matter how big the file

    #!/usr/bin/perl -- use strict; use warnings; use XML::Twig; my $xml = q{<product> <name>5 Spoke Wheel</name> <description>Reconditioned OEM</description> <price>123.45</price> <buyurl>http://www.foo.com</buyurl> <imageurl>http://www.foo.com/foo.jpg</imageurl> </product>}; $xml = join '', '<root>',$xml, $xml, $xml, '</root>'; XML::Twig->new( twig_roots => { '/root/product' => sub { print join "\n", $_->path, $_->first_child('name' )->text, $_->first_child('price' )->text, $_->first_child('buyurl' )->text, "\n"; }, }, )->parse( $xml );
Re: Assigning variables and persistence using Lib::LibXML::Reader (twig)
by Anonymous Monk on Apr 03, 2017 at 03:39 UTC

    persistence is futile

    use the twiggyness of XML::LibXML::Reader

    #!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; use XML::LibXML::Reader; my $deleteme = 'deleteme.xml'; my $xml = q{<root><product> <name>5 Spoke Wheel</name> <description>Reconditioned OEM</description> <price>123.45</price> <buyurl>http://foo.example.com</buyurl> <imageurl>http://foo.example.com/foo.jpg</imageurl> </product><product> <name>sixexample</name> <description>six example</description> <price>66666.66</price> <buyurl>http://six.example.com</buyurl> <imageurl>http://six.example.com/foo.jpg</imageurl> </product><product> <name>two</name> <description>two</description> <price>222.45</price> <buyurl>http://two.example.com</buyurl> <imageurl>http://two.example.com/foo.jpg</imageurl> </product></root>}; path( $deleteme )->spew_raw( $xml ); my $reader = XML::LibXML::Reader->new(location => $deleteme ) or die "cannot read file.xml\n"; my $pattern = XML::LibXML::Pattern->new('/root/product'); while($reader->nextPatternMatch( $pattern) ) { my $node = $reader->copyCurrentNode(!!'deep'); ## get the twig next if ! $node ->hasChildNodes; ## skip empty like closing tags processNode( $node ) ; } $reader->close; undef $reader; path( $deleteme )->remove; exit( 0 ); sub processNode { my( $product ) = @_; my $price = $product->F('./price/text()'); my $name = $product->F('./name/text()'); my $imageurl = $product->F('./imageurl/text()'); print "$price $name $imageurl\n\n"; } sub XML::LibXML::Node::F { my $self = shift; my $xpath = shift; my %prefix = @_; our $XPATHCONTEXT; $XPATHCONTEXT ||= XML::LibXML::XPathContext->new(); while( my( $p, $u ) = each %prefix ){ $XPATHCONTEXT->registerNs( $p, $u ); } $XPATHCONTEXT->findnodes( $xpath, $self ); } __END__ 123.45 5 Spoke Wheel http://foo.example.com/foo.jpg 66666.66 sixexample http://six.example.com/foo.jpg 222.45 two http://two.example.com/foo.jpg
Re: Assigning variables and persistence using Lib::LibXML::Reader
by Anonymous Monk on Apr 03, 2017 at 00:07 UTC

    THANK YOU ALL!! I was banging my head against the wall all day over this. I tried to declare variable every which way and I couldn't figure out why it wasn't working. Debug tips are great. The problem was indeed the opening and closing tags were the issue so doing the //= worked as well as adding nodeType:

    $price = $reader->readInnerXml if $reader->localName eq 'price' && + $reader->nodeType == 1;

    Whew! Glad that is over with. This has been very educational. Been a long time lurker and learner from all your wisdom here. Thanks so much.

      Please use the constants exported by XML::LibXML instead of 1!!!

        They havent changed in more than a decade!