in reply to How much can this text processing be optimized?

Right now you're pulling the entire file into a scalar while removing the junk and then lowercasing the words, and then counting them. Do this instead, just pseudo code since I can't really read your regexs:

while(<STDIN>) { s/remove junk strings//og; my @words = split(/\s+/, $_); # split into words foreach my $w (@words) { $MyWordCount{lc($w)}++; } }

I would imagine that is faster than what you are currently doing. I say this because this way you aren't having to accumulate everything into one big scalar.

Frank Wiles <frank@wiles.org>
http://www.wiles.org

Replies are listed 'Best First'.
Re^2: How much can this text processing be optimized?
by YAFZ (Pilgrim) on May 16, 2005 at 14:13 UTC
    Much better now! I was thinking along the similar line, about that putting everything into a huge scalar at once issue and trying your solution gave about 2 sec. runtime. The C# code does one thing more, seperating each word into syllable but that's another story. Thanks for the hint.