dimitarsh1 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have to process an XML SELECTIVELY. Namely, if we have field firstname within title with different first letter as arguments, I want to print only those titles that DO NOT have first name with first letter 'A'. For that I use the following code:

use strict; use XML::Twig; my $sel = 'title'; my $ign = 'title/name/fnm[@letter="a"]'; my $twig = XML::Twig->new( ignore_elts => { "$ign" => 1}, twig_roots => { "$sel" => 1}, pretty_print => 'indented' ); $twig->parsefile('1.xml'); $twig->print;

on the following XML:

<?xml version="1.0"?> <titles> <title id="1">testing<name><snm>Houston</snm>, <fnm letter="c">Carl</f +nm></name><g>69</g><ppg>20.1</ppg><rpg>3.4</rpg><apg>2.8</apg><blk>14 +</blk></title> <title id="2">testing<name><snm>Houston</snm>, <fnm letter="a">Allan</ +fnm></name><g>69</g><ppg>20.1</ppg><rpg>3.4</rpg><apg>2.8</apg><blk>1 +4</blk></title> <title id="3">testign <name>Houston <fnm j="b">Bob</fnm> <g>49</g> <title> testing the sub title in title </title> <title> this is pcdata in subtitle <title> second sub level </title> </title> </name> </title> <test> This is a test </test> </titles>

I am new to XML::Twig. I read about ignore_elts and I though it will do the job. It does, but not exactly what I want. So basically what I want is to ignore the whole title where the ignore condition is met:

<?xml version="1.0"?> <titles> <title id="1">testing<name><snm>Houston</snm>, <fnm letter="c">Carl< +/fnm></name><g>69</g><ppg>20.1</ppg><rpg>3.4</rpg><apg>2.8</apg><blk> +14</blk></title> <title id="3">testign <name>Houston <fnm j="b">Beb</fnm><g>49</g><title> testing the sub title in title </title><title> this is pcdata in subtitle <title> second sub level </title></title></name></title> </titles>

while what I get is (the title with id=2 is there but the fnm is missing):

<?xml version="1.0"?> <titles> <title id="1">testing<name><snm>Houston</snm>, <fnm letter="c">Carl< +/fnm></name><g>69</g><ppg>20.1</ppg><rpg>3.4</rpg><apg>2.8</apg><blk> +14</blk></title> <title id="2">testing<name><snm>Houston</snm>, </name><g>69</g><ppg> +20.1</ppg><rpg>3.4</rpg><apg>2.8</apg><blk>14</blk></title> <title id="3">testign <name>Houston <fnm j="b">Beb</fnm><g>49</g><title> testing the sub title in title </title><title> this is pcdata in subtitle <title> second sub level </title></title></name></title> </titles>

Any suggestions?
Thank you very much in advance.
Regards,
Dimitar

Replies are listed 'Best First'.
Re: xml::twig twig_roots / ignore_elts
by tangent (Parson) on Oct 25, 2015 at 20:52 UTC
    I don't think you can get ignore_elts to work the way you want. You could write a handler to delete the titles you want to skip:
    my $sel = 'title'; my $twig = XML::Twig->new( pretty_print => 'indented', twig_roots => { "$sel" => \&parse_title }, ); $twig->parsefile('1.xml'); $twig->print; sub parse_title { my ($t,$title) = @_; $title->delete if $title->get_xpath( 'name/fnm[@letter="a"]' ); }
    Output using your example XML:
    <?xml version="1.0"?> <titles> <title id="1">testing<name><snm>Houston</snm>, <fnm letter="c">Carl< +/fnm></name><g>69</g><ppg>20.1</ppg><rpg>3.4</rpg><apg>2.8</apg><blk> +14</blk></title> <title id="3">testign <name>Houston <fnm j="b">Bob</fnm><g>49</g><title> testing the sub title in title </title><title> this is pcdata in subtitle <title> second sub level </title></title></name></title> </titles>

      Dear tangent,

      thank you very much for the response. I guess that's the only way to go... I tried also some regexes for the title_roots but it didn't work.

      If there are other ways to condition based on children nodes, I'd like to learn about it.

      Thanks a lot, once again.