Dear Monks,
I am trying to split an xml file into multiple well-formed fragments, and an ancient solution given here: Re: XML::Twig - Filtering and duplexing output to multiple output files is doing pretty well what I am looking for with help of XML::Twig that spits into a Tee... at least with simple data input.
If I complicate a little bit the data structure by regrouping the nodes to filter into a parent node, the second file is not well formed: the parent node is missing its opening tag. And I am quite lost to find the cause.
SSCCE (the difference with initial example is the <thing_list> that contains the <thing>'s):
use XML::Twig;
use IO::Tee;
use feature 'say';
open my $frufile, '>', 'fruit.xml' or die "fruit $!";
open my $vegfile, '>', 'veg.xml' or die "veg $!";
my $tee = IO::Tee->new($frufile, $vegfile);
select $tee;
my $twig=XML::Twig->new(
twig_handlers => {
thing => \&magic,
_default_ => sub {
say STDOUT '_default_ for '.$_->name;
$_[0]->flush($tee); #default filehandle = tee
1;
},
},
pretty_print => 'indented',
empty_tags => 'normal',
);
$twig->parse( *DATA );
sub magic {
my ($thing, $element) = @_;
say STDOUT "magic for ". $element->{att}{type};
for ($element->{att}{type}) {
if (/fruit/) {
$thing->flush($frufile);
} elsif (/vegetable/) {
$thing->flush($vegfile);
} else {
$thing->purge;
}
}
1;
}
__DATA__
<batch>
<header>
<foo>1</foo>
<bar>2</bar>
<baz>3</baz>
</header>
<thing_list>
<thing type="fruit" >Im an apple!</thing>
<thing type="city" >Toronto</thing>
<thing type="vegetable" >Im a carrot!</thing>
<thing type="city" >Melrose</thing>
<thing type="vegetable" >Im a potato!</thing>
<thing type="fruit" >Im a pear!</thing>
<thing type="vegetable" >Im a pickle!</thing>
<thing type="city" >Patna</thing>
<thing type="fruit" >Im a banana!</thing>
<thing type="vegetable" >Im an eggplant!</thing>
<thing type="city" >Taumatawhakatangihangakoauauotamateaturipuk
+akapikimaungahoronukupokaiwhenuakitanatahu</thing>
</thing_list>
<trailer>
<chrzaszcz>A</chrzaszcz>
<zdzblo>B</zdzblo>
</trailer>
</batch>
While the first "fruit.xml" is ok:
<batch>
<header>
<foo>1</foo>
<bar>2</bar>
<baz>3</baz>
</header>
<thing_list>
<thing type="fruit">Im an apple!</thing>
<thing type="fruit">Im a pear!</thing>
<thing type="fruit">Im a banana!</thing>
</thing_list>
<trailer>
<chrzaszcz>A</chrzaszcz>
<zdzblo>B</zdzblo>
</trailer>
</batch>
the "veg.xml" is missing an opening tag for <thing_list>
<batch>
<header>
<foo>1</foo>
<bar>2</bar>
<baz>3</baz>
</header>
<thing type="vegetable">Im a carrot!</thing>
<thing type="vegetable">Im a potato!</thing>
<thing type="vegetable">Im a pickle!</thing>
<thing type="vegetable">Im an eggplant!</thing>
</thing_list>
<trailer>
<chrzaszcz>A</chrzaszcz>
<zdzblo>B</zdzblo>
</trailer>
</batch>
I have also noticed that if I comment out the <thing_list> tags into the data, the comment corresponding to the opening tag is also missing from veg.xml, but not from fruit.xml...
WFIW, I am using Strawberry's Perl 5.20.1 on a Windows 7 box
Update: in the case of the comments, I seem to understand that the first comment is coming while processing the first <thing> and the second should be processed from the _default_ handler while processing the rest of the file. But I do not understand if it is the same while <thing_list> is not commented.