in reply to select only duplicate entries

What have you tried? You haven't demonstrated any effort at solving your own problems. (I presume combine duplicate entries was also posted by you.)

A hash keyed by protein would be useful. The values would be lists of organs. You can use split to seperate the protein from the organ.