in reply to Word incidence count

#!/usr/bin/env perl
You know, there have been several discussions as to wether this really is more system-independent than, say, /usr/bin/perl is and basically it's very much like a flame war. This is not terribly perl-specific, but sometimes it comes around...
use strict; use warnings;
Goood!
use Data::Dumper qw( Dumper ); my $string = "Hello World!\n Oh poor Yorick, his world I knew well ye +s I did"; my @words = split( /\W+/, $string);
Well, no harm done, but split is for... ehm... splitting. Here rather than matching on non-words to discard them you may want to match on word to gather them":
my @words = /\w+/g;
<SNIP>
print "Word count: ", Dumper(%count); 1;
Huh?!? This is not needed, by any means. It's used in modules - for a well defined reason, not relevant here. Yours is simply a script...

All in all, well done!

Replies are listed 'Best First'.
Re^2: Word incidence count
by holli (Abbot) on May 17, 2005 at 08:27 UTC
    Well, no harm done, but split is for... ehm... splitting.
    This usage of split is perfectly valid and appropriate.
    my @words = /\w+/g;
    Shouldn't that be @words =~ /\w+/g; ?

    Which of them is faster depends on the data.


    holli, /regexed monk/
      If you're talking about a plain text document:
      print `wc -w /path/to/file.txt`;
      cLive ;-)
      my @words = /\w+/g;
      Shouldn't that be @words =~ /\w+/g; ?
      No. But indeed it is implictly assuming that the string to be matched is in $_ which is where it usually is in my code, but which is not the case for the OP's example, actually. But the cure is really lightweight, however:
      @words = $string =~ /\w+/g;
      Which of them is faster depends on the data.
      Please note that I didn't speak of speed. I was talking about the conceptual terseness of the concept of saying what it is that you want as opposed to that of saying what it is that is not among the stuff that you do not want.
        No. But indeed it is implictly assuming that the string to be matched is in $_ which is where it usually is in my code, but which is not the case for the OP's example, actually. But the cure is really lightweight, however:
        Yes of course. My bad.


        holli, /regexed monk/