Without the XML data you are using this code on it is
quite difficult to figure out what happens.
Just a couple of remarks though: if you replace
your main loop by check_tag( $xso) then you will get
the document root, as it is now you miss it:
my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree");
my $xso = XML::SimpleObject->new( $parser->parse($content) ) or die "c
+ould not parse!";
check_tag($xso);
...
A second remark is that I hope you are dealing with data-oriented XML, as opposed to document-oriented XML:
XML::SimpleObjects does not deal properly with mixed content (<p>this is <b>mixed</b> content</p> (just as XML::Simple BTW, must be the name): in this case the content of the p element will be "this is content". This is a weakness of all modules that stick to a model of the XML document where the data is directly within an element. You have to have an extra layer, with PCDATA or CDATA pseudo-elements (or text nodes) within all elements, or you cannot deal with mixed content.
If you post a minimum example of your XML data we might be able to help you more.
And now the obligatory XML::Twig version. Note that I will probably add the is_field method in the next release, as it seems to make sense for people who do data-oriented XML.
#!/bin/perl -w
use strict;
use XML::Twig;
my $twig = XML::Twig->new();
$twig->parse( \*DATA)
or die "could not parse!";
# the #ELT argument means that we will get only the
# "real" elements, as opposed to the ones containing the text
foreach my $tag($twig->descendants( '#ELT')) {
print $tag->gi;
print " ", $tag->text if( is_field( $tag));
print "\n";
}
sub is_field
{ my $tag= shift;
return 1 if( ($tag->children == 1) && $tag->first_child->is_text);
return 0;
}
__DATA__
<doc id="id1">
<elt id="id2">elt 1</elt>
<elt id="id3">
<subelt id="id4">subelt 1</subelt>
<subelt id="id5">subelt 2</subelt>
<subelt id="id6">subelt 3</subelt>
</elt>
<elt id="id7">elt 3</elt>
<elt id="id8"><subelt id="id9">subelt 4</subelt></elt>
</doc>
|