in reply to remove entries with duplicate characters

If the same GN is always consecutive, you can just remember the previous GN while processing the file line by line. If the the GN is different, remember the header and print it before printing the sequence, otherwise skip printing them.

#!/usr/bin/perl use warnings; use strict; my $previous_gn = ""; my $header; while (<>) { if (my ($gn) = /^>.* GN=([^ ]+)/) { if ($gn ne $previous_gn) { $previous_gn = $gn; $header = $_; } } else { if ($header) { print $header, $_; undef $header; } } }

If the header with the same GN don't have to be consecutive, you need to remember all the GN's seen so far. A hash is the best structure to remember them:

#!/usr/bin/perl use warnings; use strict; my %seen; my $header; my $gn; while (<>) { if (/^>.* GN=([^ ]+)/) { $gn = $1; $header = exists $seen{$gn} ? undef : $_; } elsif ($header) { undef $seen{$gn}; print $header, $_; } }

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]