in reply to How much can this text processing be optimized?

How does this perform?
use strict; use warnings; my %h; while (<DATA>) { $h{$_}++ for map { lc } grep { /^[a-zA-Z]+$/ } split /\W+/; } for (sort keys %h) { printf "%10s %05d\n", $_, $h{$_}; } __DATA__ This line contains garbag3, and words! This line does not.
Note: You will need to add your special characters to the character class uses in the grep-block, and alter my script to read from STDIN.

Update:

Output:
and 00001 contains 00001 does 00001 line 00002 not 00001 this 00002 words 00001
Update:

Running this on a 1.5 MB file containing the _DATA_-section repeatedly takes <1s on my system. (1.4 GHz AMD, Perl 5.8, Win XP)


holli, /regexed monk/

Replies are listed 'Best First'.
Re^2: How much can this text processing be optimized?
by YAFZ (Pilgrim) on May 16, 2005 at 14:15 UTC
    It took about 1.1 sec. on my command line redirection version (using your adapted code) and reminded me of Common Lisp. Now I'm a happier programmer, thanks for reminding the powerful map! ;-)