Re: looping over xml

Without the XML data you are using this code on it is quite difficult to figure out what happens.

Just a couple of remarks though: if you replace your main loop by check_tag( $xso) then you will get the document root, as it is now you miss it:

my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree");
my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die "c
+ould not parse!";
check_tag($xso);
...
[download]

A second remark is that I hope you are dealing with data-oriented XML, as opposed to document-oriented XML: XML::SimpleObjects does not deal properly with mixed content (<p>this is <b>mixed</b> content</p> (just as XML::Simple BTW, must be the name): in this case the content of the p element will be "this is content". This is a weakness of all modules that stick to a model of the XML document where the data is directly within an element. You have to have an extra layer, with PCDATA or CDATA pseudo-elements (or text nodes) within all elements, or you cannot deal with mixed content.

If you post a minimum example of your XML data we might be able to help you more.

And now the obligatory XML::Twig version. Note that I will probably add the is_field method in the next release, as it seems to make sense for people who do data-oriented XML.

#!/bin/perl -w
use strict;

use XML::Twig;

my $twig = XML::Twig->new();
$twig->parse( \*DATA)
            or die "could not parse!";
# the #ELT argument means that we will get only the
# "real" elements, as opposed to the ones containing the text
foreach my $tag($twig->descendants( '#ELT')) {
    print $tag->gi;
    print " ", $tag->text if( is_field( $tag));
    print "\n";
  }

sub is_field
  { my $tag= shift;
    return 1 if( ($tag->children == 1) && $tag->first_child->is_text);
    return 0;
  }

__DATA__
<doc id="id1">
  <elt id="id2">elt 1</elt>
  <elt id="id3">
    <subelt id="id4">subelt 1</subelt>
    <subelt id="id5">subelt 2</subelt>
    <subelt id="id6">subelt 3</subelt>
  </elt>
  <elt id="id7">elt 3</elt>
  <elt id="id8"><subelt id="id9">subelt 4</subelt></elt>
</doc>
[download]

Comment on Re: looping over xml Select or Download Code

Replies are listed 'Best First'.
Re: Re: looping over xml by Micz (Beadle) on Aug 06, 2001 at 12:14 UTC
hello, thank you for your remarks. a sample of the code I am trying to parse: <?xml version="1.0" encoding="UTF-8" ?> <vxml version="1.0" application="news_root.xml"> <form id="Kategorie"> <field name="kategorie"> <grammar src="gram/kategorie.grammar"/> <prompt bargein="true"> <audio src="http://xxx.61.xxx.xxx/audio/portal/news/10.wav +">xaxaxaxa</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">xaxaxaxa</audio> </prompt> <filled> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/inp +ut_end.wav">input end</audio> <if cond="kategorie == 'menue'"><goto next="../index.jsp"/ +><elseif cond="kategorie == 'schlagzeilen'"/><goto next="#Schlagzeile +n"/><elseif cond="kategorie == 'wirtschaft'"/><goto next="#Wirtschaft +"/><elseif cond="kategorie == 'boerse'"/><goto next="#Boerse"/><elsei +f cond="kategorie == 'sport'"/><goto next="#Sport"/><elseif cond="kat +egorie == 'unterhaltung'"/><goto next="#Unterhaltung"/><else/><clear +namelist="kategorie"/>123.123.123.123/nee<reprompt/></if> </filled> </field> <noinput count="1"> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/noinput +.wav">noinput</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/40.wav">I +ch habe sie wirklich nicht gehoert.</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/pauses/ +500.wav">kleine Pause</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">input start</audio> </noinput> <noinput count="2"> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/noinput +.wav">noinput</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/50.wav">I +ch habe sie wirklich nicht gehoert.</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/common/pauses/ +500.wav">kleine Pause</audio> <audio src="http://xxx.xxx.xxx.xxx/audio/portal/news/input_sta +rt.wav">input start</audio> </noinput> </form> </vxml> [download] I will implement the first point, and post further feedback. If there is a "well-formed-ness" error in the code above, it is because I triaged a bit, the document itself is longer (and validates). A further note: The point I want to reach is to have all the audio and goto tags in a hash, if my approach is totally wrong please warn me. thanks again, jan	[reply] [d/l]
Re: Re: Re: looping over xml by Micz (Beadle) on Aug 06, 2001 at 14:15 UTC
okay, here is more of my code. a simple link checker: $tasks{$URL} = $URL; while (keys(%tasks) != 0) { foreach $key (keys (%tasks)) { if (!$opt_v) { print "#"; } else { print "checking $key\n"; } check_page($key); delete $tasks{key}; } } sub check_page{ unless (defined ($content = get $_[0])) { print "\n\n NASTY ERROR! \n\n bad link: $_[0]\n\n\n"; return; } my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die +"could not parse!"; check_tag($xso); # output the information foreach $file (keys %soundfiles) { print "\n BAD ".$file." called in $_[0]\n"; } foreach $link (keys %links) { next if make_url($link) =~ /$_[0]/ ; next if $link =~ /^#/ ; $tasks{make_url($link)} = make_url($link); print "checking ".make_url($link)."\n"; } } sub check_tag{ foreach my $tag($_[0]->children()) { if ($tag->name =~ /audio/) { build_audio($tag); } if ($tag->name =~ /goto/) { $links{$tag->attribute('next')} = "goto"; } check_tag($tag); } } sub make_url { if ($_[0] =~ /http:/) { return $_[0]; } elsif ($_[0] =~ /^#/) { return $_[0]; } else { return $base_url.$_[0]; } } sub build_audio { if (!exists $soundfiles{$_[0]->attribute('src')}) { unless (head(make_url($_[0]->attribute('src')))) { $soundfiles{make_url($_[0]->attribute('src'))} = "BAD"; } } } [download]	[reply] [d/l]
Re: Re: Re: Re: looping over xml by mirod (Canon) on Aug 06, 2001 at 14:57 UTC
Using XML::Twig you create and parse the twig with: `my $t= XML::Twig->new(); $t->parse( $content);` [download] You can get all the `audio` elements by doing: `my @audio= $twig->descendants( 'audio')` (I'll let you figure out how to get all the `goto` elements ;--) You can then access the src attribute using `$elt->att( 'src')`	[reply] [d/l] [select]
Re: Re: Re: Re: Re: looping over xml by Micz (Beadle) on Aug 07, 2001 at 12:59 UTC