in reply to Re: need help with a regex
in thread need help with a regex
>P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A; Synonyms=HLAA;M
MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDT
QFVRFDSDAASQRMEPRAPWIEQEGPEYWDRNTRNVKAHSQTDRANLGTLRGYYNQSEDGS
TIQRMYGCDVGPDGRFLRGYQQDAYDGKDYIALNEDLRSWTAADMAAQITQRKW
ETAHEAEQWRAYLEGRCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATL
RCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWASVVVPSGQEQ
RYTCHVQHEGLPKPLTLRWEPSSQPTIPIVGIIAGLVLFGAVIAGAVVAAVMWRRKS
SDRKGGSYSQAASSDSAQGSDMSLTACKV
and the output should look like:
Hydrophobic stretch found in: P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A; Synonyms=HLAA;
AVVAAVMW
The match was at position: 325
Hydrophobic stretch found in:
A7MBM2 | Homo sapiens (Human). | NCBI_TaxID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742;
VAVLMLCLAVIFLC
The match was at potistion: 170
LLALVAIFF
The match was at potistion: 493
IWICWFAALAA
The match was at potistion: 705
LALALAFA
The match was at potistion: 970
Hydrophobic region(s) found in 2 sequences out of 15 sequences
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: need help with a regex
by kennethk (Abbot) on Oct 22, 2010 at 20:57 UTC | |
Sorry about the messy code I wrote I know it wasn't actually legitWriting pseudocode is considered good practice when you don't know a language. That means explain clearly what you want an algorithm to do, not just posting gibberish from the target language. I was just in a hurry and trying to get the basic idea acrossWhich you did not do, nor have you done effectively yet. Perhaps the more verbose How To Ask Questions The Smart Way may provide clear guidance on how to effective construct questions on internet forums. You still did not answer my questions on your own experience level. I will assume you are an extreme novice with access to a working script crafted by another. I can give you aid on this particular problem, but if you expect to get anywhere in the long run, you will need to learn some very basic coding concepts you apparently lack. In examining your desired output, I note that several of your character sequences do not appear in your text block, e.g. "VAVLMLCLAVIFLC", "LLALVAIFF", ... I note that "AVVAAVMW" is cited at "position: 325". This makes me suspect that the orginal file you are parsing does not contain the white space you are posting or modifies the input before filtering. I have modified your originally posted code to do something like what you request, though the numbers are wrong.
outputs >P30450 | Homo sapiens (Human). | NCBI_TaxID=9606; | 365 | Name=HLA-A; Synonyms=HLAA;Mcontains AVVAAVMW at position 420 I leave modifying it to get what you expect as an exercise for you. You will likely want to read the documentation at perlsyn, perlre, perlretut, pos and length. | [reply] [d/l] [select] |
by jwkrahn (Abbot) on Oct 22, 2010 at 21:33 UTC | |
What did you want "pos($content) - length($sequence)" to give you? The whole pattern starts at 371 and ends at 428 while the contents of $1 start at 416 and end at 424 so 420 is somewhere in the middle of $1. Have a look at the @- and @+ arrays for the start and end positions of matches. | [reply] [d/l] |
by kennethk (Abbot) on Oct 22, 2010 at 21:36 UTC | |
| [reply] |
by aquinom (Sexton) on Oct 22, 2010 at 21:49 UTC | |
| [reply] [d/l] |
by kennethk (Abbot) on Oct 22, 2010 at 22:06 UTC | |
Read more... (25 kB)
outputs:
Note you'd omitted 'I' from your original character set. | [reply] [d/l] [select] |
by aquinom (Sexton) on Oct 22, 2010 at 22:50 UTC | |
by aquinom (Sexton) on Oct 22, 2010 at 21:17 UTC | |
| [reply] |
by kennethk (Abbot) on Oct 22, 2010 at 21:28 UTC | |
Everyone learns by making mistakes. Demonstrate you can learn from those, and you will be well-regarded on this forum. If you post input and desired output (good), make sure they correspond. In this case, only including the first section of output would have been appropriate, so the two match up. I believe this works closer to your spec; it clears the contents of $header after the first print, so it will only appear once.
outputs:
Note I've changed you + to * so that your regular expression can also match entries at the start and ends of lines, not just in the middle. I assume this was an oversight on your part; sorry if this assumption is incorrect. | [reply] [d/l] [select] |