Micz has asked for the wisdom of the Perl Monks concerning the following question:

Masters of all things perl, I would like to perform an operation on every tag in a xml file. I am using XML::Parser and XML::SimpleObject to do the following:
my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die "c +ould not parse!"; foreach my $form($xso->children()) { check_tag($form); } sub check_tag{ foreach my $tag($_[0]->children()) { print $tag->name." ".$tag->value."\n"; check_tag($tag); } }

this works, but it does not show me all the tags, just some of them. What is wrong with my code?

thanks, jan

Replies are listed 'Best First'.
Re: looping over xml
by mirod (Canon) on Aug 03, 2001 at 22:04 UTC

    Without the XML data you are using this code on it is quite difficult to figure out what happens.

    Just a couple of remarks though: if you replace your main loop by check_tag( $xso) then you will get the document root, as it is now you miss it:

    my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die "c +ould not parse!"; check_tag($xso); ...

    A second remark is that I hope you are dealing with data-oriented XML, as opposed to document-oriented XML: XML::SimpleObjects does not deal properly with mixed content (<p>this is <b>mixed</b> content</p> (just as XML::Simple BTW, must be the name): in this case the content of the p element will be "this is content". This is a weakness of all modules that stick to a model of the XML document where the data is directly within an element. You have to have an extra layer, with PCDATA or CDATA pseudo-elements (or text nodes) within all elements, or you cannot deal with mixed content.

    If you post a minimum example of your XML data we might be able to help you more.

    And now the obligatory XML::Twig version. Note that I will probably add the is_field method in the next release, as it seems to make sense for people who do data-oriented XML.

    #!/bin/perl -w use strict; use XML::Twig; my $twig = XML::Twig->new(); $twig->parse( \*DATA) or die "could not parse!"; # the #ELT argument means that we will get only the # "real" elements, as opposed to the ones containing the text foreach my $tag($twig->descendants( '#ELT')) { print $tag->gi; print " ", $tag->text if( is_field( $tag)); print "\n"; } sub is_field { my $tag= shift; return 1 if( ($tag->children == 1) && $tag->first_child->is_text); return 0; } __DATA__ <doc id="id1"> <elt id="id2">elt 1</elt> <elt id="id3"> <subelt id="id4">subelt 1</subelt> <subelt id="id5">subelt 2</subelt> <subelt id="id6">subelt 3</subelt> </elt> <elt id="id7">elt 3</elt> <elt id="id8"><subelt id="id9">subelt 4</subelt></elt> </doc>
      hello,
      thank you for your remarks. a sample of the code I am trying to parse:
      <?xml version="1.0" encoding="UTF-8" ?> <vxml version="1.0" application="news_root.xml"> <form id="Kategorie"> <field name="kategorie"> <grammar src="gram/kategorie.grammar"/> <prompt bargein="true"> <audio src="http://xxx.61.xxx.xxx/audio/portal/news/10.wav +">xaxaxaxa</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">xaxaxaxa</audio> </prompt> <filled> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/inp +ut_end.wav">input end</audio> <if cond="kategorie == 'menue'"><goto next="../index.jsp"/ +><elseif cond="kategorie == 'schlagzeilen'"/><goto next="#Schlagzeile +n"/><elseif cond="kategorie == 'wirtschaft'"/><goto next="#Wirtschaft +"/><elseif cond="kategorie == 'boerse'"/><goto next="#Boerse"/><elsei +f cond="kategorie == 'sport'"/><goto next="#Sport"/><elseif cond="kat +egorie == 'unterhaltung'"/><goto next="#Unterhaltung"/><else/><clear +namelist="kategorie"/>123.123.123.123/nee<reprompt/></if> </filled> </field> <noinput count="1"> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/noinput +.wav">noinput</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/40.wav">I +ch habe sie wirklich nicht gehoert.</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/pauses/ +500.wav">kleine Pause</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">input start</audio> </noinput> <noinput count="2"> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/noinput +.wav">noinput</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/50.wav">I +ch habe sie wirklich nicht gehoert.</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/pauses/ +500.wav">kleine Pause</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">input start</audio> </noinput> </form> </vxml>
      I will implement the first point, and post further feedback. If there is a "well-formed-ness" error in the code above, it is because I triaged a bit, the document itself is longer (and validates).
      A further note: The point I want to reach is to have all the audio and goto tags in a hash, if my approach is totally wrong please warn me.

      thanks again, jan
        okay, here is more of my code. a simple link checker:
        $tasks{$URL} = $URL; while (keys(%tasks) != 0) { foreach $key (keys (%tasks)) { if (!$opt_v) { print "#"; } else { print "checking $key\n"; } check_page($key); delete $tasks{key}; } } sub check_page{ unless (defined ($content = get $_[0])) { print "\n\n NASTY ERROR! \n\n bad link: $_[0]\n\n\n"; return; } my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die +"could not parse!"; check_tag($xso); # output the information foreach $file (keys %soundfiles) { print "\n BAD ".$file." called in $_[0]\n"; } foreach $link (keys %links) { next if make_url($link) =~ /$_[0]/ ; next if $link =~ /^#/ ; $tasks{make_url($link)} = make_url($link); print "checking ".make_url($link)."\n"; } } sub check_tag{ foreach my $tag($_[0]->children()) { if ($tag->name =~ /audio/) { build_audio($tag); } if ($tag->name =~ /goto/) { $links{$tag->attribute('next')} = "goto"; } check_tag($tag); } } sub make_url { if ($_[0] =~ /http:/) { return $_[0]; } elsif ($_[0] =~ /^#/) { return $_[0]; } else { return $base_url.$_[0]; } } sub build_audio { if (!exists $soundfiles{$_[0]->attribute('src')}) { unless (head(make_url($_[0]->attribute('src')))) { $soundfiles{make_url($_[0]->attribute('src'))} = "BAD"; } } }