Re: Mysterious slow down with large data set

I cannot figure out what would make the program slow down so much...

Unnecessary work:

foreach $w1 (sort(keys %kernel)){
  ...
  foreach $w2 (sort(keys %kernel)) { ... }
}
[download]

For every word in the hash, you sort all of the other words in the hash (even though you've already sorted them). Then you compare every word in the hash again, even those you've already compared.

Instead, sort the hash keys and assign them to an array. Then process the array one word at a time. Grab the first word, then compare it to the remaining words in the array. Then do the same for the second and so on. Note that the list of words to compare gets shorter with every word you process. This is exactly what you want.

Improve your skills with Modern Perl: the free book.

Comment on Re: Mysterious slow down with large data set Download Code

Replies are listed 'Best First'.
Re^2: Mysterious slow down with large data set by jsmagnuson (Acolyte) on Feb 26, 2012 at 23:52 UTC
Thanks for pointing out the unnecessary work. Can I ask you about the idea that I can decrease the size of the list with each word? The problem is that I need to get the total similarity for each word to every other word. So I don't think I can decrease the number of comparisons per step without storing preceding results, but the memory demands are huge, even if I delete items from memory once they are retrieved. But if you see another solution, please let me know. Thank you for your help!	[reply]
Re^3: Mysterious slow down with large data set by wwe (Friar) on Feb 27, 2012 at 14:29 UTC
I don't know if the comparison of two words is direction dependent this mean compare word1 with word2 is the same as compare word2 to word1. If yes keep your algoritm if no follow chromatics tip and compare the first element to all elements in the arrays but the first one (the elements himself), then compare the second element to all other but first two because you already compared the first to a second. This what chromatic pointed to.	[reply]