in reply to Re^2: how to get clusters?
in thread how to get clusters?

That makes sense only for those pairs that the second number is higher. What about these pairs where the second number is lower? Is that a negative length?

I'm picturing your pairs as ranges on a line. So you have one line and each line represents a range on that line. If i understand correctly you want to group the rows of data together such that the ranges they represent are all withen a 200 range. What do you do with rows of data that are longer than 200 all by themselves and do you want to group by the centers or the end points some how? If you treat the start and stop as coordinates it becomes easier but i wonder if it has any meaning then?

No matter what you need to define a formula that gives use the distance between two data points. Then clustering is just a matter of apply an algorithm using that distance function. So how far apart are 802,818 and 804,820? I might be inclined to call the distance the average distance between ends or (abs(802-804) + abs(818-820)) / 2 = 2. So then the distance between 105,1 and 802,818 is 757, but that doesn't realy make a ton of sense either and you still have the confusion of a pair where the end is before the beginning.

Update: The other option might be to do the distance between the centers abs( (802 + 818)/2 - (804+820)/2) = 5 apart, that would make (105,1) and (802,818) 757 bp apart.


___________
Eric Hodges