Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

XML::Twig outputting root element start tag twice

by benizi (Hermit)
on Apr 18, 2006 at 17:45 UTC ( [id://544126]=perlquestion: print w/replies, xml ) Need Help??

benizi has asked for the wisdom of the Perl Monks concerning the following question:

I'm sure I'm missing some interaction between the various options, but I was wondering if someone (mirod?) could tell me how to accomplish the following. I have an XML document that I want to process with XML::Twig. I want the output document to retain the formatting characteristics of the input document. I also want to use twig_roots, since the file will not fit into memory, and is record-based. (i.e. the processing of each record is self-contained.). I used twig_print_outside_roots, because I want to specify a filehandle for the default prints/flushes. (Is there something more appropriate for that purpose?) The problem is that the root (wrapper) element's start tag is being output twice.

Example input:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE foo SYSTEM "/path"> <foo version="blah"> <record>stuff</record> <record>stuff 2</record> </foo>

Desired output:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE foo SYSTEM "/path"> <foo version="blah"> <record>altered stuff</record> <record>altered stuff 2</record> </foo>

My attempt:

#!/usr/bin/perl use strict; use warnings; use XML::Twig; open my $outfh, '>', "out.xml" or die ">out.xml:$!"; my $p = XML::Twig->new( twig_print_outside_roots => $outfh, twig_roots => { record => sub { $_->set_text("altered ".$_->text); shift->flush } }, empty_tags => 'html', keep_encoding => 1, keep_spaces => 1, ); $p->parsefile("in.xml"); print "DONE\n";

Actual output:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE foo SYSTEM "/path"> <foo version="blah"> <foo version="blah"><record>altered stuff</record> # the extra <foo ve +rsion="blah"> is the problem. <record>altered stuff 2</record> </foo>

Replies are listed 'Best First'.
Re: XML::Twig outputting root element start tag twice
by Tanktalus (Canon) on Apr 18, 2006 at 18:34 UTC

    First, you're missing the end-flush. Before you're really "DONE", you need to add $p->flush(). That gets the extra </foo> tag you're missing in your output. Not that you want it, but once you fix the other problem, you'll want it back.

    Second, it's the twig_print_outside_roots flag that's doing it. Remove that. Instead, change your flush calls (including the new one) to have the param "$outfh". Now you'll flush to that file.

    That leaves me with:

    #!/usr/bin/perl use strict; use warnings; use XML::Twig; open my $outfh, '>', "out.xml" or die ">out.xml:$!"; my $p = XML::Twig->new( #twig_print_outside_roots => $outfh, twig_roots => { record => sub { $_->set_text("altered ".$_->text); shift->flush($outfh), } }, empty_tags => 'html', keep_encoding => 1, keep_spaces => 1, ); $p->parsefile("in.xml"); $p->flush($outfh); print "DONE\n";
    As to why, ... I'm not sure.

    Hope that helps,

    Update: Ok, I see you really want the twig_print_outside_roots feature. It doesn't seem to do what you want it to, though. I am curious, though, as to why the formatting matters - this is XML, after all...

      Explicitly adding the $outfh is part of what I was avoiding, as it's not in the scope of the actual handlers in the real-life example. (XML::Twig has so much DWIMmery, I assumed specifying an output filehandle would be something pretty trivial.)

      As to the formatting, it's because, while I'm using XML::Twig, other people in the project aren't (yet!), and the line-based -ness of the format is easier for them to handle. (Plus, I simply prefer the aesthetics of it.)

        Why is the output filehandle not in scope? If it is available when you create the twig, you should be able to use it in the handlers (you can use a closure to pass it to the handlers). You could also use select to send all output to the filehandle, even though I would consider not so good for the maintenability of the code.

        Finally, if you are using the latest version of XML::Twig, you don't need the final flush, it's done automagically.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://544126]
Approved by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-03-28 08:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found