hello,
But the point is the dataset is large and its not restricted to the two example clusters that i have mentioned. ok, let me be more clear. I have got a dataset that is 900MB large.so will have more of thousands and thousands of clusters.
when parsing through the file, you have to read the first line for example let the first line be:
1 800 816 23
and we have to concentrate on secind and third columns. the 800 is the hit_start and 816 is the hit_stop.and if the next line has hits lying less than 200basepairs then add them to the first and go on unless and until you could not find any hits with in the 200basepairs gap.
so if you have encountered another hit that is like
1 802 818 24
1 804 820 32
1 804 820 44
then you have to make all these in to one cluster ranging from 800 -820.
and in this case your cluster_start would be 800 and your cluster_stop would be 820
liek this you have to move on and on. and if there isn't any hits with in this range then you have to start creating a next cluster. with a different cluster_start and cluster_stop.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.