Re: How much can this text processing be optimized?

Right now you're pulling the entire file into a scalar while removing the junk and then lowercasing the words, and then counting them. Do this instead, just pseudo code since I can't really read your regexs:

while(<STDIN>) { 
    s/remove junk strings//og; 
    my @words = split(/\s+/, $_); # split into words 
    foreach my $w (@words) { 
        $MyWordCount{lc($w)}++;
    }
}
[download]

I would imagine that is faster than what you are currently doing. I say this because this way you aren't having to accumulate everything into one big scalar.

Frank Wiles <frank@wiles.org>
http://www.wiles.org

Comment on Re: How much can this text processing be optimized? Download Code

Replies are listed 'Best First'.
Re^2: How much can this text processing be optimized? by YAFZ (Pilgrim) on May 16, 2005 at 14:13 UTC
Much better now! I was thinking along the similar line, about that putting everything into a huge scalar at once issue and trying your solution gave about 2 sec. runtime. The C# code does one thing more, seperating each word into syllable but that's another story. Thanks for the hint.	[reply]