in reply to remove entries with duplicate characters
#!/usr/bin/perl use warnings; use strict; my $previous_gn = ""; my $header; while (<>) { if (my ($gn) = /^>.* GN=([^ ]+)/) { if ($gn ne $previous_gn) { $previous_gn = $gn; $header = $_; } } else { if ($header) { print $header, $_; undef $header; } } }
If the header with the same GN don't have to be consecutive, you need to remember all the GN's seen so far. A hash is the best structure to remember them:
#!/usr/bin/perl use warnings; use strict; my %seen; my $header; my $gn; while (<>) { if (/^>.* GN=([^ ]+)/) { $gn = $1; $header = exists $seen{$gn} ? undef : $_; } elsif ($header) { undef $seen{$gn}; print $header, $_; } }
|
|---|