targetsmart has asked for the wisdom of the Perl Monks concerning the following question:

I need a solution for this problem. I have a text like this
Usage: perl [switches] [--] [programfile] [arguments] -0[octal] specify record separator (\0, if no argument) -a autosplit mode with -n or -p (splits $_ into @F) -C[number/list] enables the listed Unicode features -c check syntax only (runs BEGIN and CHECK blocks) -d[:debugger] run program under debugger

I need the count of words in it, no 'wc -w' this time, because, wc -w will give me 45
I need the result 47 (which means I need to count only if a word character present in the word)
I have found an answer for this,
#!/usr/bin/perl -n s/[^\w]+/ /g; # replace the non words with space next if(/^\s*$/); # discard sentence with only spaces $totalcount += (split(/ /) - 1); # split the sentence using space and +count it END{ print "Total: $totalcount<<\n"; }
is there any other answers

Replies are listed 'Best First'.
Re: word count
by pc88mxer (Vicar) on Jun 03, 2008 at 16:14 UTC
    Your word count should be 48 not 47. The problem is with this line:
    $totalcount += (split(/ /) - 1);
    If a line begins with whitespace, then subtracting one is correct, but otherwise it isn't. A better way to do this is:
    $totalcount += split(' ');
    and then you don't have to check for blank lines.

    Another simple way to perform the count:

    #!/usr/bin/perl -n while (m/\w+/g) { $count++ } END { print "count: $count\n" }
      Fore! (I couldn't resist :)
      #!/usr/bin/perl -ln $t+=@a=/\w+/g;END{print$t}
      47 bytes, counting the shebang and two LFs; 26 bytes for the one-line script by itself. And it gives with the correct answer, too (48 "words" for the input text in question).
Re: word count
by moritz (Cardinal) on Jun 03, 2008 at 15:23 UTC
    IMHO the result from wc is correct, unless you define exactly what you mean by word character.

    Maybe you'll like this better than wc:

    #!/usr/bin/perl use strict; use warnings; my $count = 0; while (<DATA>){ $count++ while m/[a-zA-Z]\w*/g; } print $count, $/; __DATA__ Usage: perl [switches] [--] [programfile] [arguments] -0[octal] specify record separator (\0, if no argument) -a autosplit mode with -n or -p (splits $_ into @F) -C[number/list] enables the listed Unicode features -c check syntax only (runs BEGIN and CHECK blocks) -d[:debugger] run program under debugger

    This searches for words beginning with a latin letter. If you want to match word characters belonging to other languages, consider the regex m/\pL\w*/g instead.