Ppeoc has asked for the wisdom of the Perl Monks concerning the following question:
I am a 2 week old Perl user and I am trying to parse a 300 mb nested XML file. So please excuse my lack of knowledge. The file follows a similar format as below
<?xml version="1.0" encoding="UTF-8"?> <APP:Report xmlns:APP="WWW" xmlns:xsi="WWW" xsi:schemaLocation="WWW"> <library> <Cat1> <Book>The book of pages</Book> <Snap></Snap> <Line1>The Beginning</Line1> <Line2>We ceased to exist</Line2> <Line3>Accept it</Line3> <Line4>Now we live</Line4> <Line5>We reject it</Line5> <Rating> <C1> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C1> <C2> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C2> <C3> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C3> </Rating> </Cat1> <Author>Sally</Author> <Publisher>Penguin</Publisher> <Cat2> <Book>The song</Book> <Snap></Snap> <Line1>This is how we do it</Line1> <Line2>I hope this works</Line2> <Line3>Please do</Line3> <Line4>Begging you</Line4> <Line5>Bye</Line5> <Rating> <C1> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C1> <C2> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C2> <C3> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>3.5</X> <Y>13.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> <elt> <X>10.5</X> <Y>11.4</Y> </elt> </C3> </Rating> </Cat2> <Author>Justin</Author> <Publisher>Victoria</Publisher> </library> </APP:Report>
I want to be able to able to display Book, Snap, Line1, Line2, line3, Line4, line5, C1, C2 and C3 in different columns of the first row, Author in row 2 and Publisher in row 3. This is just a sample of the big file that I have. I do not want to access a specific child to display. I want to be able to display all its descendants. Currently it is printing all my data row 1 column 1. My code snippet is enclosed below. What would be the best way to do this? I would gratefully appreciate any advice. Thank you!
I have edited my question. Suppose I have different types of nested children within each book. How do I display and access these nested children? I want to display it asmy $twig= new XML::Twig(); $twig->parsefile( $_); # build the twig foreach my $elt ($twig->root->children) { print $fout1 $elt->text."\n"; }
The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C1.X|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C1.Y|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C2.X|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C2.Y|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C3.X|X 6 times The book of pages|Snap|Line1|Line2|Line3|Line4|Line5|C3.Y|X 6 times . . . . . The song|Snap|Line1|Line2|Line3|Line4|Line5|C2.X|X 6 The song|Snap|Line1|Line2|Line3|Line4|Line5|C2.Y|X 6
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: how to display descendants in XML with Perl's XML::Twig
by toolic (Bishop) on Oct 15, 2015 at 21:03 UTC | |
|
Re: how to display descendants in XML with Perl's XML::Twig
by choroba (Cardinal) on Oct 15, 2015 at 21:19 UTC | |
by Ppeoc (Beadle) on Oct 16, 2015 at 13:20 UTC | |
by choroba (Cardinal) on Oct 16, 2015 at 13:45 UTC | |
by Ppeoc (Beadle) on Oct 16, 2015 at 16:12 UTC | |
by Ppeoc (Beadle) on Oct 16, 2015 at 17:35 UTC | |
by poj (Abbot) on Oct 16, 2015 at 20:33 UTC | |
by choroba (Cardinal) on Oct 18, 2015 at 20:31 UTC |