The input is a numerical string like this GO:007983, and a text string e.g. 'transport'. Duplicate instances of both are present in the input array (so I can't just swop the keys and values round). The desired output is a listing of all of the duplicate numerical strings, along with their associated text strings, which I will print to file. I will probably also try to generate some simple statistics based on these, but that comes later.
if ($overlap_cluster_terms_resplit_line =~ /^\s*(GO:\d+)\s\w+/g) { push (@overlap_cluster_terms_nothashed_keysarray, $overlap_clu +ster_desc); $overlap_cluster_terms = $1; $overlap_cluster_terms_hash{$overlap_cluster_terms} = $overlap +_cluster_desc; # unless exists $overlap_cluster_terms_hash{$overlap_cluster_terms}; if (exists $overlap_cluster_terms_hash{$overlap_cluster_terms} +) { push (@overlap_debug, $overlap_cluster_desc); print OVERLAP_OUTPUT $overlap_cluster_terms; print OVERLAP_OUTPUT "\t"; print OVERLAP_OUTPUT $overlap_cluster_desc; print OVERLAP_OUTPUT "\n\n"; }
[download]
BTW: The $overlap_cluster_desc comes from another if statement within the foreach loop. Both not shown.