the next thing I want is to avoid duplicates in my output file
I don't know what duplicates you want to avoid. I guessed and came up with the following. If there are particular cases of duplicate that your are interested, or if you need to ignore the entire entry, you will have to be more specific.
use strict; use warnings; my %seen; while (<>) { if (/^DE|^GN/) { next if (/Putative uncharacterized protien/); foreach (/=([^;]+);/g) { my $lc = lc($_); if ( $seen{$lc}++ > 0) { print "hey! we already saw $lc!!\n"; } else { print "$lc\n"; } } } elsif (/^ID/) { print "\n"; } }
In reply to Re: how to parse a UniProt Flat file
by ig
in thread how to parse a UniProt Flat file
by stanleysj
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |