Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: XML::LibXML out of memory

by Anonymous Monk
on Mar 24, 2022 at 11:52 UTC ( [id://11142364]=note: print w/replies, xml ) Need Help??


in reply to Re: XML::LibXML out of memory
in thread XML::LibXML out of memory

I made a slight mod to your code as I'm experiencing a problem.
#! /usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; print "Importing...\n"; my $file = 'my.xml'; my $reader = 'XML::LibXML::Reader'->new(location => $file) or die; my $entry_pattern = 'XML::LibXML::Pattern'->new('/martif/text/body/ter +mEntry'); while ($reader->nextPatternMatch($entry_pattern)) { my $termEntry = $reader->copyCurrentNode(1); print "$termEntry\n"; for my $lang_set ($termEntry->findnodes('langSet')) { my $language = $lang_set->getAttribute('xml:lang'); for my $term_grp ($lang_set->findnodes('./tig')){ my $term = $term_grp->findvalue('./term'); print "$language: $term\n"; } } } print "Done!\n";
I get this result, but with an interesting empty (ish) node just before "Done!"
Importing... <termEntry> <langSet xml:lang="en"> <tig><term>English</term></tig> <tig><term>Saesneg</term></tig> </langSet> <langSet xml:lang="cs"> <tig><term>Czech</term></tig> <tig><term>Tsieceg</term></tig> </langSet> <langSet xml:lang="de"> <tig><term>German</term></tig> <tig><term>Almaeneg</term></tig> </langSet> </termEntry> en: English en: Saesneg cs: Czech cs: Tsieceg de: German de: Almaeneg <termEntry/> Done!

Is this expected behaviour? As I can't find any direct reference as to why this should be the case

I've had some help on StackEchange which suggested this was normal behaviour - But I thought I'd ask for a second opinion

This link : https://metacpan.org/dist/XML-LibXML/view/lib/XML/LibXML/Reader.pod#nextPatternMatch-(compiled_pattern)

Suggests that nextPatternMatch should "Skip nodes following the current one in the document order until an element matching a given compiled pattern is reached."

This is ambiguous since it doesn't specify if it's "XML_READER_TYPE_ELEMENT" or "XML_READER_TYPE_END_ELEMENT" or either.

I'm wondering if I should report a bug?

Replies are listed 'Best First'.
Re^3: XML::LibXML out of memory
by choroba (Cardinal) on Mar 24, 2022 at 12:01 UTC
    You can check the nodetype in the condition:
    while ($reader->nextPatternMatch($entry_pattern) && $reader->nodeType == XML_READER_TYPE_ELEMENT ) {

    or, if more than one termEntry is expected,

    while ($reader->nextPatternMatch($entry_pattern)) { if ($reader->nodeType == XML_READER_TYPE_ELEMENT) { ...

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Thanks.

      I got round the problem with

      next if $reader->nodeType != XML_READER_TYPE_ELEMENT;

      As recommended to me.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11142364]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-18 05:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found