in reply to Re^4: need help with a regex
in thread need help with a regex
Everyone learns by making mistakes. Demonstrate you can learn from those, and you will be well-regarded on this forum.
If you post input and desired output (good), make sure they correspond. In this case, only including the first section of output would have been appropriate, so the two match up.
I believe this works closer to your spec; it clears the contents of $header after the first print, so it will only appear once.
#!/usr/bin/perl use strict; use warnings; local $/; # Slurp my $content = <DATA>; my ($header) = $content =~ /^(>.*?\n)/m; while ($content =~ /^[\w]*?([VMFWLCA]{8,})[\w]*?$/mg) { my $sequence = $1; print $header, "contains $sequence at position ", pos($content) - +length($sequence), "\n"; $header = ""; } __DATA__ >P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A; + Synonyms=HLAA;M MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDT QFVRFDSDAASQRMEPRAPWIEQEGPEYWDRNTRNVKAHSQTDRANLGTLRGYYNQSEDGS TIQRMYGCDVGPDGRFLRGYQQDAYDGKDYIALNEDLRSWTAADMAAQITQRKW ETAHEAEQWRAYLEGRCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATL RCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWASVVVPSGQEQ RYTCHVQHEGLPKPLTLRWEPSSQPTIPIVGIIAGLVLFGAVIAGAVVAAVMWRRKS SDRKGGSYSQAASSDSAQGSDMSLTACKVVAVLMLCLAVIFLC
outputs:
>P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A; + Synonyms=HLAA;M contains AVVAAVMW at position 420 contains VVAVLMLCLAV at position 461
Note I've changed you + to * so that your regular expression can also match entries at the start and ends of lines, not just in the middle. I assume this was an oversight on your part; sorry if this assumption is incorrect.
|
|---|