in reply to How much can this text processing be optimized?
Right now you're pulling the entire file into a scalar while removing the junk and then lowercasing the words, and then counting them. Do this instead, just pseudo code since I can't really read your regexs:
while(<STDIN>) { s/remove junk strings//og; my @words = split(/\s+/, $_); # split into words foreach my $w (@words) { $MyWordCount{lc($w)}++; } }
I would imagine that is faster than what you are currently doing. I say this because this way you aren't having to accumulate everything into one big scalar.
Frank Wiles <frank@wiles.org>
http://www.wiles.org
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How much can this text processing be optimized?
by YAFZ (Pilgrim) on May 16, 2005 at 14:13 UTC |