Re: How much can this text processing be optimized?

How does this perform?

use strict;
use warnings;

my %h;

while (<DATA>)
{
    $h{$_}++ for map { lc } grep { /^[a-zA-Z]+$/ } split /\W+/;
}

for (sort keys %h)
{
    printf "%10s %05d\n", $_, $h{$_};
}
__DATA__
This line contains garbag3, and words!
This line does not.
[download]

Note: You will need to add your special characters to the character class uses in the grep-block, and alter my script to read from STDIN.

Update:

Output:

       and 00001
  contains 00001
      does 00001
      line 00002
       not 00001
      this 00002
     words 00001
[download]

Update:

Running this on a 1.5 MB file containing the _DATA_-section repeatedly takes <1s on my system. (1.4 GHz AMD, Perl 5.8, Win XP)

holli, /regexed monk/

Comment on Re: How much can this text processing be optimized? Select or Download Code

Replies are listed 'Best First'.
Re^2: How much can this text processing be optimized? by YAFZ (Pilgrim) on May 16, 2005 at 14:15 UTC
It took about 1.1 sec. on my command line redirection version (using your adapted code) and reminded me of Common Lisp. Now I'm a happier programmer, thanks for reminding the powerful map! ;-)	[reply]