Hi --
I am new to XML::Twig. A neat module indeed.
I have a question about stream vs. in-memory processing.
I am using Twig to change the form of an XML document.
The DTD is complex and the document is large; here's an abstraction of the base document.
#### HAND EDITED, NOT TESTED
<a>
<b name="funny words">
<c name="foo"/>
<c name="baz"/>
</b>
<b name="foods"/>
<c name="apple"/>
<c name="pear"/>
<c name="cheese"/>
</b>
</a>
I want to promote each "C" into its own "B", cloning the parent and making each kid an only-child, yielding something like this
#### HAND EDITED, NOT TESTED
<a>
<b name="foo">
<c name="foo"/>
</b>
<b name="baz">
<c name="baz"/>
</b>
<b name="apple">
<c name="apple"/>
</b>
<b name="pear">
<c name="pear"/>
</b>
<b name="cheese">
<c name="cheese"/>
</b>
</a>
The Twig code I've written works, and is something like this
#### HAND EDITED
my $t = XML::Twig->new(
twig_handlers => {
b => \&b,
},
pretty_print => 'indented',
);
$t->parsefile($xml_file);
$t->flush;
sub b {
my ( $t, $x ) = @_;
my @c = $x->children('c');
my %bs;
foreach my $c (@c) {
my $text = $c->att('name');
$c->cut;
push ( @{ $bs{$text} }, $c );
}
foreach my $text ( keys %bs ) {
my $b =
$x->insert_new_elt( 'after', 'b', { %{ $x->atts } } );
$b->set_att( 'name' => $text );
foreach ( @{ $bs{$text} } ) {
$adg->insert_new_elt( 'first_child', 'c',
{ %{ $_->atts } } );
}
}
$x->delete;
}
}
My questions:
- I think I am processing the document in memory, as it pauses for a long while before spitting out output. Is this the case?
- If so, how could change this to a stream process, to enable me to run much large docs w/o exhausting memory? I've tried flush and print and couldn't get them right; the resulting XML was garbled.
- Is there a better way to do this (clone parents to give each kid their own parent) in Twig?
- Dumb question: is there a nice Twig way to send the XML to a file, or do I need to tie STDOUT?
Thanks!
rkg
PS When I said "hand edited" above, I mean I took working code and working XML and simplified them for this post -- possible a typo crept in during the simplification for the post. But the original works and changes the original XML appropriately.