comment on

So basically I want to capture those low (start) and high (end) values as you rightly point out. It's not clear enough to notice because the formatting on this forum doesn't permit it but if you look at:

#col_1    col_2  col_3                col_4        col_5
GTCT          GC TTCAGTGACTTCGAGGCGCG GC           GTCC
[download]

This of course, is a segment of a larger genetic sequence consisting of the nucleotides adenine(A), thymine(T), cytosine(C) and guanine(G).

Where G binds to C

and A binds to T

It just so happens that in this position there is a potential for the sequence to bind in on itself forming a hairpin this is more popularly referred to as a 'genetic palindrome'.

I'll try to explain.

There are 5 columns.

-Columns 2-4 represent the entire hairpin.

-Columns 2 and 4 represent the stem

-Column 3 represents the 'spacer' or 'gap' which forms the loop

-Columns 1 and 5 represent the nucleotide flanking either side of the hairpin.

     ______
    /      \
   |        |   <--- The SPACER
   \       / 
    \     /
     C---G  <--- The STEM  
     G---C  
____/     \_____  <--- The FLANK
[download]

Now you may notice that in my data some of the rows basically represent the same sequence, except with an extended stem. Such as:

12 ..      35   TCT          GC TTCAGTGACTTCGAGGCGCG GC           AGCT
11 ..      36   GTC         TGC TTCAGTGACTTCGAGGCGCG GCA          GCTG
10 ..      37   GGT        CTGC TTCAGTGACTTCGAGGCGCG GCAG         CTGC
 9 ..      38   TGG       TCTGC TTCAGTGACTTCGAGGCGCG GCAGC        TGCT
[download]

Now of the 4 options of a hairpin I want to choose the most extended hairpin which is:

 9 ..      38   TGG       TCTGC TTCAGTGACTTCGAGGCGCG GCAGC        TGCT
[download]

Because its stem is more robust than the relatively flimsy stem of:

12 ..      35   TCT          GC TTCAGTGACTTCGAGGCGCG GC           AGCT
[download]

Which has a measly stem of 2 bases and probably won't maintain the hairpin structure long enough in the busyness of molecular processing to have any real molecular influence in terms of gene regulation or what have you... But then again nature/biology is full of surprises and exceptions as always... :S

So what I wanted to do is write a script that reads in these start and end values and basically detects the presence of what is essentially one hairpin, by taking the hairpin with the most extended stem. As far as I know, the data will allow for this to be achieved.

I hope this helps.

In reply to Re^4: Making use of a hash of an array... by Peter Keystrokes
in thread Making use of a hash of an array... by Peter Keystrokes

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.