word count

targetsmart has asked for the wisdom of the Perl Monks concerning the following question:

I need a solution for this problem. I have a text like this

Usage: perl [switches] [--] [programfile] [arguments]
  -0[octal]       specify record separator (\0, if no argument)
  -a              autosplit mode with -n or -p (splits $_ into @F)
  -C[number/list] enables the listed Unicode features
  -c              check syntax only (runs BEGIN and CHECK blocks)
  -d[:debugger]   run program under debugger
[download]

I need the count of words in it, no 'wc -w' this time, because, wc -w will give me 45
I need the result 47 (which means I need to count only if a word character present in the word)

I have found an answer for this,

#!/usr/bin/perl -n
s/[^\w]+/ /g;     # replace the non words with space
next if(/^\s*$/); # discard sentence with only spaces
$totalcount += (split(/ /) - 1); # split the sentence using space and 
+count it
END{ print "Total: $totalcount<<\n"; }
[download]

is there any other answers

Comment on word count Select or Download Code

Replies are listed 'Best First'.
Re: word count by pc88mxer (Vicar) on Jun 03, 2008 at 16:14 UTC
Your word count should be 48 not 47. The problem is with this line: `$totalcount += (split(/ /) - 1);` [download] If a line begins with whitespace, then subtracting one is correct, but otherwise it isn't. A better way to do this is: `$totalcount += split(' ');` [download] and then you don't have to check for blank lines. Another simple way to perform the count: `#!/usr/bin/perl -n while (m/\w+/g) { $count++ } END { print "count: $count\n" }` [download]	[reply] [d/l] [select]
Re^2: word count by graff (Chancellor) on Jun 04, 2008 at 01:00 UTC
Fore! (I couldn't resist :) `#!/usr/bin/perl -ln $t+=@a=/\w+/g;END{print$t}` [download] 47 bytes, counting the shebang and two LFs; 26 bytes for the one-line script by itself. And it gives with the correct answer, too (48 "words" for the input text in question).	[reply] [d/l]
Re: word count by moritz (Cardinal) on Jun 03, 2008 at 15:23 UTC
IMHO the result from `wc` is correct, unless you define exactly what you mean by word character. Maybe you'll like this better than wc: `#!/usr/bin/perl use strict; use warnings; my $count = 0; while (<DATA>){ $count++ while m/[a-zA-Z]\w/g; } print $count, $/; __DATA__ Usage: perl [switches] [--] [programfile] [arguments] -0[octal] specify record separator (\0, if no argument) -a autosplit mode with -n or -p (splits $_ into @F) -C[number/list] enables the listed Unicode features -c check syntax only (runs BEGIN and CHECK blocks) -d[:debugger] run program under debugger` [download] This searches for words beginning with a latin letter. If you want to match word characters belonging to other languages, consider the regex `m/\pL\w/g` instead.	[reply] [d/l] [select]