Everyone learns by making mistakes. Demonstrate you can learn from those, and you will be well-regarded on this forum.
If you post input and desired output (good), make sure they correspond. In this case, only including the first section of output would have been appropriate, so the two match up.
I believe this works closer to your spec; it clears the contents of $header after the first print, so it will only appear once.
#!/usr/bin/perl
use strict;
use warnings;
local $/; # Slurp
my $content = <DATA>;
my ($header) = $content =~ /^(>.*?\n)/m;
while ($content =~ /^[\w]*?([VMFWLCA]{8,})[\w]*?$/mg) {
my $sequence = $1;
print $header, "contains $sequence at position ", pos($content) -
+length($sequence), "\n";
$header = "";
}
__DATA__
>P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A;
+ Synonyms=HLAA;M
MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDT
QFVRFDSDAASQRMEPRAPWIEQEGPEYWDRNTRNVKAHSQTDRANLGTLRGYYNQSEDGS
TIQRMYGCDVGPDGRFLRGYQQDAYDGKDYIALNEDLRSWTAADMAAQITQRKW
ETAHEAEQWRAYLEGRCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATL
RCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWASVVVPSGQEQ
RYTCHVQHEGLPKPLTLRWEPSSQPTIPIVGIIAGLVLFGAVIAGAVVAAVMWRRKS
SDRKGGSYSQAASSDSAQGSDMSLTACKVVAVLMLCLAVIFLC
outputs:
>P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A;
+ Synonyms=HLAA;M
contains AVVAAVMW at position 420
contains VVAVLMLCLAV at position 461
Note I've changed you + to * so that your regular expression can also match entries at the start and ends of lines, not just in the middle. I assume this was an oversight on your part; sorry if this assumption is incorrect. |