Oh, I thought that part was made obvious in the documentation of the source code:
"Now we want to find a "score" for this paragraph, finding the best set
of keywords which "apply" to it. We favour keyword sets which have a
large number of matches (obviously a paragraph is better if it matches
"a" and "c" than if it just matches "a") and with multi-word keywords.
(A paragraph which matches "fresh cheese sandwiches" en bloc is worth
picking out, even if it has no other matches.)"
It seems the intent is to find out how powerful the keyword is within a given paragraph. More matches means a better fit, more relevancy.
And on second thought, there's really nothing to be gained by turning the algorithm on its side. It's utilizing Perl's strengths already.
If speed is of concern, profile and find where the bottleneck is. Tom Duff (of Duff's Device) said this:
"If your code is too slow, you must make it faster. If no better algorithm is available, you must trim cycles."
Step one: Figure out where the trouble really is (profile). Step two, try to devise a better algorithm for that particular segment of code. Step three (if two fails): Remove cycles. That may be easier said than done, but unless you're already certain this particular loop is your problem we can't be sure.
The source code for the module itself gives a clue immediately following that loop:
#XXX : Possible optimization: Give up if there are no matches
|