comment on

First, let me say I know nothing about Orf sequences...

A very quick search of CPAN threw up this bioperl examples/longorf.pl which might be worth a look ? Again, I don't know whether bioperl is useful... but you might, at least, get some ideas from looking at some of the code there ?

Looking at the fragment of code you have so far... [and it would be easier to do that if (a) it was enclosed in <code> tags, (b) was runnable, (c) had some sample data with it, (d) a description of what was expected, and (e) almost anything that allowed a humble programmer to understand what was required.]

...as far as I can see, you've collected possible start positions in @startsRF1 and stop positions in @stopsRF1 -- these positions are marked by certain 3 character sequences, which are constrained to appear at three character boundaries. Now you want to process stuff between those start and stop positions. Because of the way they've been collected, those arrays are in ascending order of string position, which is a start. Now:

can what you want to process include one or more start and/or end positions ? So, if the starts are: (6, 36, 69) and the ends (42, 57, 90), do you want to look at: (6..42, 6..57, 6..90, 36..42, 36..90, 69..90), or (36..42, 36..57, 69..90), or just (36..42, 69..90) ?
do the start and end of the string count as start and end positions ?

Whatever the answers to the above, the simple approach is two foreach loops, the outer cycling through the start positions and the inner the end positions, deciding which start..end combinations to consider. Inside all that you can extract the substring using substr. Then ... I dunno; I regret I don't know what a protein sequence looks like.

If you have huge numbers of start and end positions, and depending on the answers to the above, you may want a more cunning approach, to speed things up. What I have suggested above is O(n^2), which is fine for little problems, and (frankly) horrible for big ones. But, never optimise until you have to -- and even then, think twice.

In reply to Re: Orf subsequences by gone2015
in thread Orf subsequences by odegbon

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.