On the whole, I think we need more information. But here are some vague questions:
- You have taken on an N^2 problem. Most of the work is probably going to be spent interating and comparing. As such, threading is not going to buy you that much. Is this is a multi-CPU box?
- Is the list in memory in a raw or processed state? That is, does it need parsing/evaluating/decoding every time, or is that done once? What I'm trying to ask is: is the list in memory in an access-efficient structure? A little processing time up-front may make yield speedier access.
- Perhaps you can eliminate half of the comparisons by somehow marking the data once you have done AB to prevent BA, as you suggested.
- Silly question maybe, but is there a faster machine with more memory around to use? Or more CPUs?
- Once your algorithm is good, PerlMonks will be able to improve the implementation of it significantly.