seki has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I am trying to split an xml file into multiple well-formed fragments, and an ancient solution given here: Re: XML::Twig - Filtering and duplexing output to multiple output files is doing pretty well what I am looking for with help of XML::Twig that spits into a Tee... at least with simple data input.
If I complicate a little bit the data structure by regrouping the nodes to filter into a parent node, the second file is not well formed: the parent node is missing its opening tag. And I am quite lost to find the cause.
SSCCE (the difference with initial example is the <thing_list> that contains the <thing>'s):While the first "fruit.xml" is ok:use XML::Twig; use IO::Tee; use feature 'say'; open my $frufile, '>', 'fruit.xml' or die "fruit $!"; open my $vegfile, '>', 'veg.xml' or die "veg $!"; my $tee = IO::Tee->new($frufile, $vegfile); select $tee; my $twig=XML::Twig->new( twig_handlers => { thing => \&magic, _default_ => sub { say STDOUT '_default_ for '.$_->name; $_[0]->flush($tee); #default filehandle = tee 1; }, }, pretty_print => 'indented', empty_tags => 'normal', ); $twig->parse( *DATA ); sub magic { my ($thing, $element) = @_; say STDOUT "magic for ". $element->{att}{type}; for ($element->{att}{type}) { if (/fruit/) { $thing->flush($frufile); } elsif (/vegetable/) { $thing->flush($vegfile); } else { $thing->purge; } } 1; } __DATA__ <batch> <header> <foo>1</foo> <bar>2</bar> <baz>3</baz> </header> <thing_list> <thing type="fruit" >Im an apple!</thing> <thing type="city" >Toronto</thing> <thing type="vegetable" >Im a carrot!</thing> <thing type="city" >Melrose</thing> <thing type="vegetable" >Im a potato!</thing> <thing type="fruit" >Im a pear!</thing> <thing type="vegetable" >Im a pickle!</thing> <thing type="city" >Patna</thing> <thing type="fruit" >Im a banana!</thing> <thing type="vegetable" >Im an eggplant!</thing> <thing type="city" >Taumatawhakatangihangakoauauotamateaturipuk +akapikimaungahoronukupokaiwhenuakitanatahu</thing> </thing_list> <trailer> <chrzaszcz>A</chrzaszcz> <zdzblo>B</zdzblo> </trailer> </batch>
the "veg.xml" is missing an opening tag for <thing_list><batch> <header> <foo>1</foo> <bar>2</bar> <baz>3</baz> </header> <thing_list> <thing type="fruit">Im an apple!</thing> <thing type="fruit">Im a pear!</thing> <thing type="fruit">Im a banana!</thing> </thing_list> <trailer> <chrzaszcz>A</chrzaszcz> <zdzblo>B</zdzblo> </trailer> </batch>
<batch> <header> <foo>1</foo> <bar>2</bar> <baz>3</baz> </header> <thing type="vegetable">Im a carrot!</thing> <thing type="vegetable">Im a potato!</thing> <thing type="vegetable">Im a pickle!</thing> <thing type="vegetable">Im an eggplant!</thing> </thing_list> <trailer> <chrzaszcz>A</chrzaszcz> <zdzblo>B</zdzblo> </trailer> </batch>
I have also noticed that if I comment out the <thing_list> tags into the data, the comment corresponding to the opening tag is also missing from veg.xml, but not from fruit.xml...
WFIW, I am using Strawberry's Perl 5.20.1 on a Windows 7 box
Update: in the case of the comments, I seem to understand that the first comment is coming while processing the first <thing> and the second should be processed from the _default_ handler while processing the rest of the file. But I do not understand if it is the same while <thing_list> is not commented.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Buggy output from XML::Twig on a Tee
by toolic (Bishop) on Feb 25, 2016 at 14:54 UTC | |
by seki (Monk) on Feb 25, 2016 at 16:41 UTC | |
|
Re: Buggy output from XML::Twig on a Tee
by ateague (Monk) on Feb 25, 2016 at 14:54 UTC | |
by seki (Monk) on Feb 25, 2016 at 16:45 UTC | |
by ateague (Monk) on Feb 25, 2016 at 17:27 UTC | |
|
Re: Buggy output from XML::Twig on a Tee
by mr_ron (Deacon) on Feb 26, 2016 at 16:25 UTC | |
by seki (Monk) on Mar 01, 2016 at 16:04 UTC |