in reply to Re: Count number of unique tags in XML files
in thread Count number of unique tags in XML files

Of course not:

What if the XML includes this: <!-- <tag>this tag commented out</tag> -->? Though this might look contrived, you can actually find it in the XML for the XML recommendation itself.

Then what if the XML is:

<!DOCTYPE foo SYSTEM "foo.dtd" []> <foo>&bar</foo>
You have no idea what's inside the entity. it could be just text, or it could include 278 unique tags. Note that this breaks all 3 pieces of code above, as XML::Parser (and thus pyx) do not expand external entities. The easiest solution I found uses... XML::Twig as usual!.

perl -MXML::Twig -e'XML::Twig->new( expand_external_ents => 1)->parsefile( shift )->print'

will expand external entities, and then the regular pyx or XML::Parser solution will work.