comment on

Like I said, where are the <code> tags around your input and output? Embedding text in HTML is notorious for changing formatting. A regular expression I write will very likely fail because the character sequence displayed on the screen will differ from what you have in your file. I also note your "desired output" is significantly different from what you specified in the original post. For example, the word "Hydrophobic" appears nowhere the OP and the word "contains" appears nowhere in the new spec.

Sorry about the messy code I wrote I know it wasn't actually legit

Writing pseudocode is considered good practice when you don't know a language. That means explain clearly what you want an algorithm to do, not just posting gibberish from the target language.

I was just in a hurry and trying to get the basic idea across

Which you did not do, nor have you done effectively yet. Perhaps the more verbose How To Ask Questions The Smart Way may provide clear guidance on how to effective construct questions on internet forums.

You still did not answer my questions on your own experience level. I will assume you are an extreme novice with access to a working script crafted by another. I can give you aid on this particular problem, but if you expect to get anywhere in the long run, you will need to learn some very basic coding concepts you apparently lack.

In examining your desired output, I note that several of your character sequences do not appear in your text block, e.g. "VAVLMLCLAVIFLC", "LLALVAIFF", ... I note that "AVVAAVMW" is cited at "position: 325". This makes me suspect that the orginal file you are parsing does not contain the white space you are posting or modifies the input before filtering.

I have modified your originally posted code to do something like what you request, though the numbers are wrong.

#!/usr/bin/perl
use strict;
use warnings;

local $/; # Slurp
my $content = <DATA>;

my ($header) = $content =~ /^(>.*?)$/m;
while ($content =~ /^[\w]+?([VMFWLCA]{8,})[\w]+?$/mg) {
    my $sequence = $1;
    print $header, "contains $sequence at position ", pos($content) - 
+length($sequence), "\n";
}
__DATA__
>P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A;
+ Synonyms=HLAA;M

MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDT
QFVRFDSDAASQRMEPRAPWIEQEGPEYWDRNTRNVKAHSQTDRANLGTLRGYYNQSEDGS
TIQRMYGCDVGPDGRFLRGYQQDAYDGKDYIALNEDLRSWTAADMAAQITQRKW
ETAHEAEQWRAYLEGRCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATL
RCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWASVVVPSGQEQ
RYTCHVQHEGLPKPLTLRWEPSSQPTIPIVGIIAGLVLFGAVIAGAVVAAVMWRRKS
SDRKGGSYSQAASSDSAQGSDMSLTACKV
[download]

outputs

>P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A; Synonyms=HLAA;Mcontains AVVAAVMW at position 420

I leave modifying it to get what you expect as an exercise for you. You will likely want to read the documentation at perlsyn, perlre, perlretut, pos and length.

In reply to Re^3: need help with a regex by kennethk
in thread need help with a regex by aquinom

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.