So, you have code that, essentially, has a runtime of N^3 and you're wondering why its runtime slows down dramatically as N goes up? *blinks*
A few thoughts:
- This code looks to be extremely parallelizable. Maybe something like Parallel::ForkManager might help.
- The number of "#additional things here" looks very suspicious. Maybe, you shouldn't do so much inside loops?
- Essentially, you're attempting to get a large set of mother-father pairs, then do stuff with those pairs. This sounds very much like a directed cartesian join. That screams a relational database to me. They are very efficient at set generation.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?