comment on

Hi,

Well i have a question for you guys. so what i need is, i need to do some interval searching. Example: I have a set of random (randomly distributed) intervals (all related in the following way a<=b, where a and b are start and stop positions of intervals)
now what i need to do is to compute the coverage of a certain position in my interval. so if i have :

1..10 - interval
 and 

2..5 - subinterval
4..7 - subinterval
[download]

my coverage of the 1-10 would be :

1->0
2->1
3->1
4->2
5->2
6->1
7->1
8->0
9->0
10->0
[download]

the naive solution to do this would be to create an array and then for every interval position ++ the value indexed by the position. BUT this is not an option since i don't have enough memory on my machine (the intervals are as long ad 10^9 positions). Now what i could do is something like this:


#!/usr/bin/perl

use Bit::Vector;
use Data::Dumper;
use strict;



my @vecArray;
my $x=0;
while($x<4){
    $vecArray[$x] =  Bit::Vector->new(21);
    my $int = int(rand(21));
    my $min = $int;
    my $max = $int<=20-5 ? $int+5 : $int; 
    $vecArray[$x]->Interval_Fill($min,$max);
    $x++;
}
my $set1=  Bit::Vector->new(21);
$set1->Intersection($vecArray[0],$vecArray[2]);
my $start = 0;

while ($start < $set1->Size() and my ($min, $max) = $set1->Interval_Sc
+an_inc($start)){
    print $min . "-" . $max . "\n";
    $start = $max+1 ;
}
[download]

so, create a vector (or should i say several of them) and fill it with subinterval positions... Oh by the way every set of subintervals contains only non-overlapping intervals, meaning there cannot be a case as stated above , but the subintervals from different sets CAN overlap. so in the above case one subinterval would be from one set and the other from a different set(The number of intervals in both subinterval sets can be as high as interval/avg(length of subinterval)).
So this approach is very cheap when it comes to memory and computation and is very practical since i can pinpoint the location of two intersecting sets,but how to do a multi intersection and to do it efficiently, meaning no pairwise comparisons(it would PROBABLY be unacceptable since the number of subsets can rise to the size of the underlying interval), is is a completely different thing. So this is the question i cannot answer or am i just too stupid and tired to see the solution (I had a feeling that, that last beer was a little too much, dammit :)). Summary:


|interval| = X
|subinterval| = X/n  (n- number of intervals)
Y=X  (Y - nubmber of subinterval sets)

O(X^2)   - not good if implemented naively
[download]

Does someone has a suggestion , tip, alternative approach ??

Cheers

UPDATE: Thanks ppl, these three replays gave me quite a lot to think about. so, to give a quick global thank you and a few answers to monks that have replied so far.
I know you guys are not mind readers but you are really good at it :). so:

zek152 : This is a nice idea and as soon as you posted it i started to work on it, actually i knew i read about it somewhere before (never had a need to actually do any implementation though) so when i saw your post i started working on it straight away. but my original idea was the same as the one the BrowserUk posted, though i didn't realised that it can be achieved it the way he did it. so the answer to BrowserUk's question is : yes. the thing i had in mind when i was writing this post was that i have to do a pair-wise comparison for every two possible pairs of vectors then we are talking about "for within a for loop" and that is O(x^2) but as you pointed out looping the intersect through vectors does the same thing essentially (meaning useful to me) but in the O(x) time. so once more a fresh pair of eyes see better then sleepy old ones.

Of course searching through start and stop indexes is a good solution but unfortunately my intervals are rather short (sorry for not stating that) which means too large index arrays... But thanx.

So thank you guys once more and this is definitely going to be a good reference post for me , of course if anyone else has an alternative strategy he/she is more then welcome to post it :)

Cheers

In reply to searching a vector array by baxy77bax

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.