nico38100 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Guys I am trying to use TWIG module to delete some nodes in a file and print in a new file . It's 2Go bytes files so I tried to parse my XML chunk by chunk but it seems the method that I am using is not working because my file XML is corrupted at the end it's because of the flush which is not reading all the file

Here the code
#--------------------------------------------------------------------- +--------- # This function is to add a twig handler to remove the node in paramet +er #--------------------------------------------------------------------- +--------- sub delete_node { #--------------------------------------------------------------------- +--------- my ($node_to_delete,$file_to_read) = @_; copy($file_to_read,$file_to_read.'-tmp_'.$dateLog.'.xml') open my $file_end, '>', $file_to_read.'-parsed_'.$dateLog.'.xml' my $handlers = {$node_to_delete => section_delete_node($file_end)} +; my $twig = new XML::Twig(pretty_print => 'indented', twig_handlers => $handlers,)->parsefile($f +ile_to_read); close $file_end; if (-s $file_to_read.'-parsed_'.$dateLog.'.xml') { move($file_to_read.'-parsed_'.$dateLog.'.xml', $file_to_read); } else { move($file_to_read.'-tmp_'.$dateLog.'.xml', $file_to_read); } } #--------------------------------------------------------------------- +--------- # This function remove the node in parameter #--------------------------------------------------------------------- +--------- sub section_delete_node { my ($file_end)= @_; return sub { my( $t, $section)= @_; $section->delete(); $t -> flush($file_end); } }

I want for example delete the node dob in my file

<?xml version="1.0" encoding="UTF-8"?> <record category="B" editor="" entered="2000-12-04" sub-category=" +PEP" uid="7320" updated="2018-12-12"> <person ssn="" e-i="E"> <title xsi:nil="true"/> <position xsi:nil="true"/> <names> <first_name/> <last_name>BA</last_name> </names> <agedata> <age xsi:nil="true"/> <as_of_date xsi:nil="true"/> <dob xsi:nil="true"/> <dobs> <dob xsi:nil="true"/> </dobs> <deceased xsi:nil="true"/> </agedata> </person> <details> <id_numbers> <id loc="INT" type="">fd</id> </id_numbers> <place_of_birth xsi:nil="true"/> <locations> <location country="df" city="re" state="">Shahrara</lo +cation> </locations> </details> </record>

And here the result

<?xml version="1.0" encoding="UTF-8"?> <record category="B" editor="" entered="2000-12-04" sub-category=" +PEP" uid="7320" updated="2018-12-12"> <person ssn="" e-i="E"> <title xsi:nil="true"/> <position xsi:nil="true"/> <names> <first_name/> <last_name>BA</last_name> </names> <agedata> <age xsi:nil="true"/> <as_of_date xsi:nil="true"/> <deceased xsi:nil="true"/> </agedata> </person> </record> <details> <id_numbers> <id loc="INT" type="">fd</id> </id_numbers> <place_of_birth xsi:nil="true"/> <locations> <location country="df" city="re" state="">Shahrara</lo +cation> </locations> </details>

If you have some help it would be great . Many Thanks

Replies are listed 'Best First'.
Re: XML:twig XML wrong
by Discipulus (Canon) on Jan 30, 2019 at 12:35 UTC
    Hello nico38100 and welcome to the monastery and to the wonderful world of perl!

    I'm a bit rusty with XML but, if I understood your request, this deletes unwanted elements

    use strict; use warnings; use XML::Twig; my $twig=XML::Twig->new( twig_handlers => { 'dob' => sub{ $_->delete;}, 'dobs' => sub{ $_->delete;}, _default_ => sub{ $_[0]->flush }, }, pretty_print => 'indented', empty_tags => 'normal', ); $twig->parse( *DATA ); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <record category="B" editor="" entered="2000-12-04" sub-category=" +PEP" uid="7320" updated="2018-12-12"> <person ssn="" e-i="E"> <title xsi:nil="true"/> <position xsi:nil="true"/> <names> <first_name/> <last_name>BA</last_name> </names> <agedata> <age xsi:nil="true"/> <as_of_date xsi:nil="true"/> <dob xsi:nil="true"/> <dobs> <dob xsi:nil="true"/> </dobs> <deceased xsi:nil="true"/> </agedata> </person> <details> <id_numbers> <id loc="INT" type="">fd</id> </id_numbers> <place_of_birth xsi:nil="true"/> <locations> <location country="df" city="re" state="">Shahrara</lo +cation> </locations> </details> </record>

    The output will be:

    <?xml version="1.0" encoding="UTF-8"?> <record category="B" editor="" entered="2000-12-04" sub-category="PEP" + uid="7320" updated="2018-12-12"> <person e-i="E" ssn=""> <title xsi:nil="true"/> <position xsi:nil="true"/> <names> <first_name/> <last_name>BA</last_name> </names> <agedata> <age xsi:nil="true"/> <as_of_date xsi:nil="true"/> <deceased xsi:nil="true"/> </agedata> </person> <details> <id_numbers> <id loc="INT" type="">fd</id> </id_numbers> <place_of_birth xsi:nil="true"/> <locations> <location city="re" country="df" state="">Shahrara</location> </locations> </details> </record>

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      Hello Discipulus and many thanks for your help I try what you proposed me and it works amazing . Have a nice day and thanks again !

Re: XML:twig XML wrong
by choroba (Cardinal) on Jan 30, 2019 at 14:50 UTC
    You can also use XML::XSH2's stream command that doesn't load the whole document into memory.
    stream :f input.xml :F output.xml select (dobs | dob) { rm ../* ; }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: XML:twig XML wrong
by Jenda (Abbot) on Mar 20, 2019 at 00:53 UTC

    If you do not need the XML to be reformated and do not mind the empty lines left over after removing those tags, then this will be even more efficient:

    use strict; use warnings; use XML::Rules; my $rules =XML::Rules->new( style => 'filter', rules => { '^dob' => 'skip', '^dobs' => 'skip', }, ); $rules->filter( *DATA ); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <record category="B" editor="" entered="2000-12-04" sub-category=" +PEP" uid="7320" updated="2018-12-12"> ...

    If you wanted to skip only some of the <dob> tags based on their attributes you could do something like this:

    my $rules =XML::Rules->new( style => 'filter', rules => { '^dob' => sub { $_[1]{"xsi:nil"} eq "true" ? '' : 'handle'}, '^dobs' => 'skip', }, );

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.