Ppeoc has asked for the wisdom of the Perl Monks concerning the following question:

I am a 2 week old Perl user and I am trying to parse a 300 mb nested XML file. So please excuse my lack of knowledge. The file follows a similar format as below

<?xml version="1.0" encoding="UTF-8"?> <APP:Report xmlns:APP="WWW" xmlns:xsi="WWW" xsi:schemaLocation="WWW"> <library> <Cat1> <Book>The book of pages</Book> <Snap></Snap> <Line1>The Beginning</Line1> <Line2>We ceased to exist</Line2> <Line3>Accept it</Line3> <Line4>Now we live</Line4> <Line5>We reject it</Line5> <Rating> <C1> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C1> <C2> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C2> <C3> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C3> </Rating> </Cat1> <Author>Sally</Author> <Publisher>Penguin</Publisher> <Cat2> <Book>The song</Book> <Snap></Snap> <Line1>This is how we do it</Line1> <Line2>I hope this works</Line2> <Line3>Please do</Line3> <Line4>Begging you</Line4> <Line5>Bye</Line5> <Rating> <C1> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C1> <C2> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C2> <C3> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C3> </Rating> </Cat2> <Author>Justin</Author> <Publisher>Victoria</Publisher> </library> </APP:Report>

I want to be able to able to display Book, Snap, Line1, Line2, line3, Line4, line5, C1, C2 and C3 in different columns of the first row, Author in row 2 and Publisher in row 3. This is just a sample of the big file that I have. I do not want to access a specific child to display. I want to be able to display all its descendants. Currently it is printing all my data row 1 column 1. My code snippet is enclosed below. What would be the best way to do this? I would gratefully appreciate any advice. Thank you!

my $twig= new XML::Twig(); $twig->parsefile( $_); # build the twig foreach my $elt ($twig->root->children) { print $fout1 $elt->text."\n"; }
I have edited my question. Suppose I have different types of nested children within each book. How do I display and access these nested children? I want to display it as
The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C1.X|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C1.Y|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C2.X|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C2.Y|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C3.X|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C3.Y|X 6 times . . . . . The song|Snap|Line1|Line2|Line3|Line4|Line5|C2.X|X 6 The song|Snap|Line1|Line2|Line3|Line4|Line5|C2.Y|X 6

Replies are listed 'Best First'.
Re: how to display descendants in XML with Perl's XML::Twig
by toolic (Bishop) on Oct 15, 2015 at 21:03 UTC
Re: how to display descendants in XML with Perl's XML::Twig
by choroba (Cardinal) on Oct 15, 2015 at 21:19 UTC
    Using alternative module, XML::LibXML::Reader:
    #!/usr/bin/perl use warnings; use strict; my @els = qw( Book Snap Line1 Line2 Line3 Line4 Line5 C1 C2 C3 ); use XML::LibXML::Reader; my $reader = XML::LibXML::Reader->new(location => 'file.xml') or die; my %line1; while ($reader->read) { next unless $reader->nodeType == XML_READER_TYPE_ELEMENT; my $name = $reader->name; if (grep $_ eq $name, @els) { $line1{$name} = $reader->copyCurrentNode(1)->textContent; } elsif ('Author' eq $name) { print join "\t", map $_ // '??', @line1{@els}; %line1 = (); print "\n", $reader->copyCurrentNode(1)->textContent, "\n"; } elsif ('Publisher' eq $name) { print $reader->copyCurrentNode(1)->textContent, "\n"; } }

    Update: Or, similarly, using stream from XML::XSH2:

    stream :f 'file.xml' :F '/dev/null' select elt { echo :s (Book) {"\t"} (Snap) {"\t"} (Line1) {"\t"} (Line2) {"\t"} (Line3) {"\t"} (Line4) {"\t"} (Line5) {"\t"} (Rating/C1) {"\t"} (Rating/C2) {"\t"} (Rating/C3) ; } select Author { echo (.) ; } select Publisher { echo (.) ; } ;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thanks for the code. The problem with using this is that the XML file I am working on is 300 mb and highly nested. Each child has diffrent levels of nesting. The if else would be tedious to use. My other question What if I had nested children within nested children? What would be the most efficient to do it? For example how do I access the elements of elt for each C? My 2nd question is about how I could display these elements like
      The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C1.X| The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C1.Y| The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C2.X| The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C2.Y| The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C3.X| The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C3.Y| . . . . . The song|Snap|Line1|Line2|Line3|Line4|Line5|C2.X| The song|Snap|Line1|Line2|Line3|Line4|Line5|C2.Y| Example <Rating> <C1> <elt> <X></X> <X></X> </elt> <elt> <elt> </C1> <C2> <elt> <elt> <elt> </C2> <C3> <elt> <elt> <elt> </C3> </Rating>
      Is there another XML module that can be more helpful than XML::Twig for this purpose?
        I don't understand. Can you post a bit bigger input I can test my code against? Use the <readmore> tags to save readers from excessive scrolling.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ