in reply to Merging Complex Hashes

One things I didn't quite understand is the following:

Suppose you have four bacteria, A, B, C and D. A and B have one gene in common, B and C have one gene in common, and C and D have one in common. Should they all be merged into the same group?

If yes, then your problem represents an undirected graph, and you're looking for connected subsets - which is and a fairly standard problem, and can be solved easily once you turn the hashes around, ie for each gene you store which bacteria they are part of.