Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

XML word count

by mirod (Canon)
on Aug 31, 2000 at 03:22 UTC ( [id://30446]=sourcecode: print w/replies, xml ) Need Help??
Category: utilities (XML?)
Author/Contact Info
Description:

wc_xmljust count the words in an XML file, excluding all mark-up (and attribute values)

You will need pyx (either from XML::PYX or the Python or Java version, it really doesn't matter) installed

Adding a character count so it behaves more like the unix wc utility is left as an exercice for the reader.

#!/bin/perl -w
 use strict;

 my $nbw=0;

 foreach my $file (@ARGV)
   { open( XML, "pyx $file |") or die "cannot open file $file: $!";
     while( <XML>)
       { next unless m/^-/;   # skip markup
         next if( m/^-\\n$/); # skip line returns
         my @words= split;    # get the words
         $nbw+= @words;       # get the number of words in the line
       }
     close XML;
   }
 print $nbw, " words\n";
Replies are listed 'Best First'.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://30446]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-20 08:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found