wc_xmljust count the words in an XML file,
excluding all mark-up (and attribute values)
You will need pyx (either from XML::PYX or the
Python or Java version, it really doesn't matter) installed
Adding a character count so it behaves more like the
unix wc utility is left as an exercice for the reader.
#!/bin/perl -w
use strict;
my $nbw=0;
foreach my $file (@ARGV)
{ open( XML, "pyx $file |") or die "cannot open file $file: $!";
while( <XML>)
{ next unless m/^-/; # skip markup
next if( m/^-\\n$/); # skip line returns
my @words= split; # get the words
$nbw+= @words; # get the number of words in the line
}
close XML;
}
print $nbw, " words\n";