in reply to Count number of unique tags in XML files

Almost uniquely with XML this is a case where a regex would also work:

perl -n -e 'while(s/<([\w\d\:]+)//){$f{$1}++;}\ END{print map "$_\n", sort keys %f;\ print "There are ", scalar(keys %f), " tags in the files\n"} * +.xml

Assuming well formed XML of course

Replies are listed 'Best First'.
Re: Re: Count number of unique tags in XML files
by mirod (Canon) on Apr 09, 2004 at 17:15 UTC

    Of course not:

    What if the XML includes this: <!-- <tag>this tag commented out</tag> -->? Though this might look contrived, you can actually find it in the XML for the XML recommendation itself.

    Then what if the XML is:

    <!DOCTYPE foo SYSTEM "foo.dtd" []> <foo>&bar</foo>
    You have no idea what's inside the entity. it could be just text, or it could include 278 unique tags. Note that this breaks all 3 pieces of code above, as XML::Parser (and thus pyx) do not expand external entities. The easiest solution I found uses... XML::Twig as usual!.

    perl -MXML::Twig -e'XML::Twig->new( expand_external_ents => 1)->parsefile( shift )->print'

    will expand external entities, and then the regular pyx or XML::Parser solution will work.