ateague has asked for the wisdom of the Perl Monks concerning the following question:
Good morning!
I am using XML::Twig to conditionally filter out elements in an XML file and then conditionally "duplex" the output to two different output files. I have managed to jury-rig something that gives me the correct output, but I imagine there is a better, more correct way to accomplish the task that does not involve reprocessing the input file multiple times.
In my sample program below, I am splitting <thing> elements with type attributes of "vegetable" and "fruit" off into separate files. <thing> elements with a "city" attribute are filtered out and deleted. The <header> and <footer> elements are duplexed to both output files. Is there a way to conditionally split target elements off into separate files and duplicate elements "outside" the target element to separate files without having to read the input file multiple times?
#!/usr/bin/perl use 5.018; use strict; use warnings; use XML::Twig; { my $t; my $pos = tell 'DATA'; # save the offset... # Process fruit open (my $FRUIT, '>', './fruit.xml') or die "./fruit.xml:\n$!\n$^E"; $t = XML::Twig->new( twig_handlers => { 'thing' => sub { _filter(@_, 'fruit', $FRUIT); 1; }, 'thing//*' => sub { 1; }, '_default_' => sub { $_[0]->flush($FRUIT); 1; }, '#CDATA' => sub { 1; }, }, pretty_print => 'indented', comments => 'drop', # remove any comments empty_tags => 'normal',# empty tags = <tag/> ); $t->parse(*DATA); close $FRUIT; seek 'DATA', $pos, 0; # reset DATA for the second run-through # Process vegetables open (my $VEG, '>', './veg.xml') or die "./veg.xml:\n$!\n$^E"; $t = XML::Twig->new( twig_handlers => { 'thing' => sub { _filter(@_, 'vegetable', $VEG); 1; }, 'thing//*' => sub { 1; }, '_default_' => sub { $_[0]->flush($VEG); 1; }, '#CDATA' => sub { 1; }, }, pretty_print => 'indented', comments => 'drop', # remove any comments empty_tags => 'normal',# empty tags = <tag/> ); $t->parse(*DATA); close $VEG; } sub _filter { my ($_twig, $thing_element, $keep_me, $PRINT_FILE) = @_; # Flush the twig to file if the 'type' attribute matches... if ( $thing_element->{att}{type} eq $keep_me ) { $_twig->flush($PRINT_FILE); } # ... otherwise delete the twig else { $thing_element->delete(); } return 1; } __DATA__ <batch> <header> <foo>1</foo> <bar>2</bar> <baz>3</baz> </header> <thing type="fruit" >Im an apple!</thing> <thing type="city" >Toronto</thing> <thing type="vegetable" >Im a carrot!</thing> <thing type="city" >Melrose</thing> <thing type="vegetable" >Im a potato!</thing> <thing type="fruit" >Im a pear!</thing> <thing type="vegetable" >Im a pickle!</thing> <thing type="city" >Patna</thing> <thing type="fruit" >Im a banana!</thing> <thing type="vegetable" >Im an eggplant!</thing> <thing type="city" >Taumatawhakatangihangakoauauotamateaturipuk +akapikimaungahoronukupokaiwhenuakitanatahu</thing> <trailer> <chrzaszcz>A</chrzaszcz> <zdzblo>B</zdzblo> </trailer> </batch>
Thank you for your time.
perl -v This is perl 5, version 18, subversion 2 (v5.18.2) built for MSWin32-x +64-multi-thread (with 1 registered patch, see perl -V for more detail)
perl -MXML::Twig -E "say $XML::Twig::VERSION;" 3.48
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::Twig - Filtering and duplexing output to multiple output files
by Loops (Curate) on Nov 12, 2014 at 00:19 UTC | |
by ateague (Monk) on Nov 12, 2014 at 22:06 UTC | |
|
Re: XML::Twig - Filtering and duplexing output to multiple output files
by Discipulus (Canon) on Nov 12, 2014 at 08:06 UTC | |
by mirod (Canon) on Nov 12, 2014 at 14:14 UTC | |
by Discipulus (Canon) on Nov 13, 2014 at 08:47 UTC |