in reply to looping over xml

Without the XML data you are using this code on it is quite difficult to figure out what happens.

Just a couple of remarks though: if you replace your main loop by check_tag( $xso) then you will get the document root, as it is now you miss it:

my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die "c +ould not parse!"; check_tag($xso); ...

A second remark is that I hope you are dealing with data-oriented XML, as opposed to document-oriented XML: XML::SimpleObjects does not deal properly with mixed content (<p>this is <b>mixed</b> content</p> (just as XML::Simple BTW, must be the name): in this case the content of the p element will be "this is content". This is a weakness of all modules that stick to a model of the XML document where the data is directly within an element. You have to have an extra layer, with PCDATA or CDATA pseudo-elements (or text nodes) within all elements, or you cannot deal with mixed content.

If you post a minimum example of your XML data we might be able to help you more.

And now the obligatory XML::Twig version. Note that I will probably add the is_field method in the next release, as it seems to make sense for people who do data-oriented XML.

#!/bin/perl -w use strict; use XML::Twig; my $twig = XML::Twig->new(); $twig->parse( \*DATA) or die "could not parse!"; # the #ELT argument means that we will get only the # "real" elements, as opposed to the ones containing the text foreach my $tag($twig->descendants( '#ELT')) { print $tag->gi; print " ", $tag->text if( is_field( $tag)); print "\n"; } sub is_field { my $tag= shift; return 1 if( ($tag->children == 1) && $tag->first_child->is_text); return 0; } __DATA__ <doc id="id1"> <elt id="id2">elt 1</elt> <elt id="id3"> <subelt id="id4">subelt 1</subelt> <subelt id="id5">subelt 2</subelt> <subelt id="id6">subelt 3</subelt> </elt> <elt id="id7">elt 3</elt> <elt id="id8"><subelt id="id9">subelt 4</subelt></elt> </doc>

Replies are listed 'Best First'.
Re: Re: looping over xml
by Micz (Beadle) on Aug 06, 2001 at 12:14 UTC
    hello,
    thank you for your remarks. a sample of the code I am trying to parse:
    <?xml version="1.0" encoding="UTF-8" ?> <vxml version="1.0" application="news_root.xml"> <form id="Kategorie"> <field name="kategorie"> <grammar src="gram/kategorie.grammar"/> <prompt bargein="true"> <audio src="http://xxx.61.xxx.xxx/audio/portal/news/10.wav +">xaxaxaxa</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">xaxaxaxa</audio> </prompt> <filled> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/inp +ut_end.wav">input end</audio> <if cond="kategorie == 'menue'"><goto next="../index.jsp"/ +><elseif cond="kategorie == 'schlagzeilen'"/><goto next="#Schlagzeile +n"/><elseif cond="kategorie == 'wirtschaft'"/><goto next="#Wirtschaft +"/><elseif cond="kategorie == 'boerse'"/><goto next="#Boerse"/><elsei +f cond="kategorie == 'sport'"/><goto next="#Sport"/><elseif cond="kat +egorie == 'unterhaltung'"/><goto next="#Unterhaltung"/><else/><clear +namelist="kategorie"/>123.123.123.123/nee<reprompt/></if> </filled> </field> <noinput count="1"> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/noinput +.wav">noinput</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/40.wav">I +ch habe sie wirklich nicht gehoert.</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/pauses/ +500.wav">kleine Pause</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">input start</audio> </noinput> <noinput count="2"> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/noinput +.wav">noinput</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/50.wav">I +ch habe sie wirklich nicht gehoert.</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/pauses/ +500.wav">kleine Pause</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">input start</audio> </noinput> </form> </vxml>
    I will implement the first point, and post further feedback. If there is a "well-formed-ness" error in the code above, it is because I triaged a bit, the document itself is longer (and validates).
    A further note: The point I want to reach is to have all the audio and goto tags in a hash, if my approach is totally wrong please warn me.

    thanks again, jan
      okay, here is more of my code. a simple link checker:
      $tasks{$URL} = $URL; while (keys(%tasks) != 0) { foreach $key (keys (%tasks)) { if (!$opt_v) { print "#"; } else { print "checking $key\n"; } check_page($key); delete $tasks{key}; } } sub check_page{ unless (defined ($content = get $_[0])) { print "\n\n NASTY ERROR! \n\n bad link: $_[0]\n\n\n"; return; } my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die +"could not parse!"; check_tag($xso); # output the information foreach $file (keys %soundfiles) { print "\n BAD ".$file." called in $_[0]\n"; } foreach $link (keys %links) { next if make_url($link) =~ /$_[0]/ ; next if $link =~ /^#/ ; $tasks{make_url($link)} = make_url($link); print "checking ".make_url($link)."\n"; } } sub check_tag{ foreach my $tag($_[0]->children()) { if ($tag->name =~ /audio/) { build_audio($tag); } if ($tag->name =~ /goto/) { $links{$tag->attribute('next')} = "goto"; } check_tag($tag); } } sub make_url { if ($_[0] =~ /http:/) { return $_[0]; } elsif ($_[0] =~ /^#/) { return $_[0]; } else { return $base_url.$_[0]; } } sub build_audio { if (!exists $soundfiles{$_[0]->attribute('src')}) { unless (head(make_url($_[0]->attribute('src')))) { $soundfiles{make_url($_[0]->attribute('src'))} = "BAD"; } } }

        Using XML::Twig you create and parse the twig with:

        my $t= XML::Twig->new(); $t->parse( $content);

        You can get all the audio elements by doing: my @audio= $twig->descendants( 'audio') (I'll let you figure out how to get all the goto elements ;--)

        You can then access the src attribute using $elt->att( 'src')