in reply to XML::LibXML out of memory

Unfortunately, you didn't say why the file is so large, i.e. what part of the structure is repeated many times. If it's the termEntry , turning your code to use XML::LibXML::Reader is rather easy:
#! /usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; print "Importing...\n"; my $file = 'my.xml'; my $reader = 'XML::LibXML::Reader'->new(location => $file) or die; my $entry_pattern = 'XML::LibXML::Pattern'->new('/martif/text/body/ter +mEntry'); while ($reader->nextPatternMatch($entry_pattern)) { my $termEntry = $reader->copyCurrentNode(1); for my $lang_set ($termEntry->findnodes('langSet')) { my $language = $lang_set->getAttribute('xml:lang'); for my $term_grp ($lang_set->findnodes('./tig')){ my $term = $term_grp->findvalue('./term'); print "$language: $term\n"; } } } print "Done!\n";

Tested with the following input:

<martif> <text> <body> <termEntry> <langSet xml:lang="en"> <tig><term>English</term></tig> <tig><term>Saesneg</term></tig> </langSet> <langSet xml:lang="cs"> <tig><term>Czech</term></tig> <tig><term>Tsieceg</term></tig> </langSet> <langSet xml:lang="de"> <tig><term>German</term></tig> <tig><term>Almaeneg</term></tig> </langSet> </termEntry> </body> </text> </martif>

Reader is a pull parser that doesn't need to load the whole file into memory, but while walking it, you can ask it to inflate the current node into the whole DOM object (which is what copyCurrentNode(1) does.)

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: XML::LibXML out of memory
by Anonymous Monk on Mar 24, 2022 at 11:52 UTC
    I made a slight mod to your code as I'm experiencing a problem.
    #! /usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; print "Importing...\n"; my $file = 'my.xml'; my $reader = 'XML::LibXML::Reader'->new(location => $file) or die; my $entry_pattern = 'XML::LibXML::Pattern'->new('/martif/text/body/ter +mEntry'); while ($reader->nextPatternMatch($entry_pattern)) { my $termEntry = $reader->copyCurrentNode(1); print "$termEntry\n"; for my $lang_set ($termEntry->findnodes('langSet')) { my $language = $lang_set->getAttribute('xml:lang'); for my $term_grp ($lang_set->findnodes('./tig')){ my $term = $term_grp->findvalue('./term'); print "$language: $term\n"; } } } print "Done!\n";
    I get this result, but with an interesting empty (ish) node just before "Done!"
    Importing... <termEntry> <langSet xml:lang="en"> <tig><term>English</term></tig> <tig><term>Saesneg</term></tig> </langSet> <langSet xml:lang="cs"> <tig><term>Czech</term></tig> <tig><term>Tsieceg</term></tig> </langSet> <langSet xml:lang="de"> <tig><term>German</term></tig> <tig><term>Almaeneg</term></tig> </langSet> </termEntry> en: English en: Saesneg cs: Czech cs: Tsieceg de: German de: Almaeneg <termEntry/> Done!

    Is this expected behaviour? As I can't find any direct reference as to why this should be the case

    I've had some help on StackEchange which suggested this was normal behaviour - But I thought I'd ask for a second opinion

    This link : https://metacpan.org/dist/XML-LibXML/view/lib/XML/LibXML/Reader.pod#nextPatternMatch-(compiled_pattern)

    Suggests that nextPatternMatch should "Skip nodes following the current one in the document order until an element matching a given compiled pattern is reached."

    This is ambiguous since it doesn't specify if it's "XML_READER_TYPE_ELEMENT" or "XML_READER_TYPE_END_ELEMENT" or either.

    I'm wondering if I should report a bug?

      You can check the nodetype in the condition:
      while ($reader->nextPatternMatch($entry_pattern) && $reader->nodeType == XML_READER_TYPE_ELEMENT ) {

      or, if more than one termEntry is expected,

      while ($reader->nextPatternMatch($entry_pattern)) { if ($reader->nodeType == XML_READER_TYPE_ELEMENT) { ...

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        Thanks.

        I got round the problem with

        next if $reader->nodeType != XML_READER_TYPE_ELEMENT;

        As recommended to me.