Sean O'Rourke's code for the Wide Finder benchmark does something very similar, you might need to change only a few lines.
The idea is to partition the input file into n partitions for n processes, forked via a pipe open. Each one works independently on its part of the file, storing its result into a hash, then passing the hash back to the parent on STDIN using Storable. Then you just need to merge the hashes into one.
The idea is to partition the input file into n partitions for n processes,
Perhaps you can explain how this schema will allow "I'm looking for pairs ie A1:A2, B1:B2 etc. Each member of the pair can exist anywhere in the file. "?