perl : load perl -nE : $1 stands for : ([^\(]+) for example Metan1_4283 in Metan1_4283(Metac1_3189) $2 stands for : \n((.*)\) for example (Metac1_3189) in Metan1_4283(Metac1_3189) $1 ne $2 : Means if first pattern does not match second pattern. ++ $seen{$2} : < 2 : less than 2. say "$1 : $2"' : prints out into two columns. raw_data.txt : path to file goes here.