That makes sense only for those pairs that the second number is higher. What about these pairs where the second number is lower? Is that a negative length?

I'm picturing your pairs as ranges on a line. So you have one line and each line represents a range on that line. If i understand correctly you want to group the rows of data together such that the ranges they represent are all withen a 200 range. What do you do with rows of data that are longer than 200 all by themselves and do you want to group by the centers or the end points some how? If you treat the start and stop as coordinates it becomes easier but i wonder if it has any meaning then?

No matter what you need to define a formula that gives use the distance between two data points. Then clustering is just a matter of apply an algorithm using that distance function. So how far apart are 802,818 and 804,820? I might be inclined to call the distance the average distance between ends or (abs(802-804) + abs(818-820)) / 2 = 2. So then the distance between 105,1 and 802,818 is 757, but that doesn't realy make a ton of sense either and you still have the confusion of a pair where the end is before the beginning.

Update: The other option might be to do the distance between the centers abs( (802 + 818)/2 - (804+820)/2) = 5 apart, that would make (105,1) and (802,818) 757 bp apart.


___________
Eric Hodges

In reply to Re^3: how to get clusters? by eric256
in thread how to get clusters? by sirna

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.