Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

XML:Twig -- changing in-mem process to stream

by rkg (Hermit)
on Oct 15, 2003 at 09:38 UTC ( [id://299363]=perlquestion: print w/replies, xml ) Need Help??

rkg has asked for the wisdom of the Perl Monks concerning the following question:

Hi --

I am new to XML::Twig. A neat module indeed.

I have a question about stream vs. in-memory processing.

I am using Twig to change the form of an XML document. The DTD is complex and the document is large; here's an abstraction of the base document.

#### HAND EDITED, NOT TESTED <a> <b name="funny words"> <c name="foo"/> <c name="baz"/> </b> <b name="foods"/> <c name="apple"/> <c name="pear"/> <c name="cheese"/> </b> </a>
I want to promote each "C" into its own "B", cloning the parent and making each kid an only-child, yielding something like this
#### HAND EDITED, NOT TESTED <a> <b name="foo"> <c name="foo"/> </b> <b name="baz"> <c name="baz"/> </b> <b name="apple"> <c name="apple"/> </b> <b name="pear"> <c name="pear"/> </b> <b name="cheese"> <c name="cheese"/> </b> </a>
The Twig code I've written works, and is something like this
#### HAND EDITED my $t = XML::Twig->new( twig_handlers => { b => \&b, }, pretty_print => 'indented', ); $t->parsefile($xml_file); $t->flush; sub b { my ( $t, $x ) = @_; my @c = $x->children('c'); my %bs; foreach my $c (@c) { my $text = $c->att('name'); $c->cut; push ( @{ $bs{$text} }, $c ); } foreach my $text ( keys %bs ) { my $b = $x->insert_new_elt( 'after', 'b', { %{ $x->atts } } ); $b->set_att( 'name' => $text ); foreach ( @{ $bs{$text} } ) { $adg->insert_new_elt( 'first_child', 'c', { %{ $_->atts } } ); } } $x->delete; } }
My questions:
  • I think I am processing the document in memory, as it pauses for a long while before spitting out output. Is this the case?
  • If so, how could change this to a stream process, to enable me to run much large docs w/o exhausting memory? I've tried flush and  print and couldn't get them right; the resulting XML was garbled.
  • Is there a better way to do this (clone parents to give each kid their own parent) in Twig?
  • Dumb question: is there a nice Twig way to send the XML to a file, or do I need to tie STDOUT?
Thanks!

rkg

PS When I said "hand edited" above, I mean I took working code and working XML and simplified them for this post -- possible a typo crept in during the simplification for the post. But the original works and changes the original XML appropriately.

Replies are listed 'Best First'.
Re: XML:Twig -- changing in-mem process to stream
by mirod (Canon) on Oct 15, 2003 at 11:48 UTC

    Wahouh! You sure do things the hard way!

    Below is a version that loads only one b at a time.

    To send the XML to a file just pass a filehandle ref to flush or print: $t->flush( \*FILE).

    #!/usr/bin/perl -w use strict; use XML::Twig; my $t = XML::Twig->new( twig_handlers => { b => \&b, }, pretty_print => 'indented', ); $t->parse( \*DATA); $t->flush; sub b { my ( $t, $b ) = @_; foreach my $c ($b->children('c')) { # yep, that does it: wrap a b element around the c $c->wrap_in( b => { name => $c->att( 'name') } ) } $b->erase; # remove the original b $t->flush; # you need to flush here if you want to free the memory } __DATA__ <a> <b name="funny words"> <c name="foo"/> <c name="baz"/> </b> <b name="foods"> <c name="apple"/> <c name="pear"/> <c name="cheese"/> </b> </a>
Re: XML:Twig -- changing in-mem process to stream
by rkg (Hermit) on Oct 15, 2003 at 09:49 UTC
    Update: When I said "the DTD is complex", I meant the tags have many attributes not shown here. The structure is really simple and is as above: one "A" element, containing 1+ "B" elems, each with 1+ "C" elems, each "C" elem with no children. No content anywhere, just tags.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://299363]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-23 23:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found