A range is any pair of legal coordinates. It can be 'simple', like (5,20) which means 5,6,7,...,19,20, or 'wrapped' like (95,10) which means 95,96,..,99,100,1,2,3,...,9,10. Note that the coordinate system start from 1 (not 0) and that ranges are always close or inclusive (i.e. the start and end coordinate of a range are both considered in range).
I have a bunch of ranges. For each coordinate i in the coordinate system (i.e. foreach i in (1 .. max_length), I want to find the size of the smallest window centered around i, such that no range fully contains this window.
To make things clearer, let's suppose max_length = 10.
1234567890 coordinate ---------- +++ range (1,3) ++++++ range (2,7) ++ + range (0,2) ---------- 5557753113 size of smallest window...
For coordinate 1, if we look at a window of size 1 that is centered at 1, it is just (1,1). It is covered by a range (actually, two ranges), so we continue for a larger window size (let's only look at odd-sized windows). Let's look at a window of size 3 that is centered at 1. This is (0,2) and it's still covered. Now, we get to a window of size 5 which is (9,3). this is not covered by any range so the result for coordinate 1 is 5. We do the same for each coordinate. Note that for coordinate 8 we stop at the first size (1) since (8,8) is not covered by any range. The other values are displayed at the last line of the above code.
Now, I wish to calculate this last line quickly. the actual size of the coordinate system is around 4M and the number of ranges is about 25k, each of length 5k.
What I currently do is to initialize a results array of size max_length (i.e. 4M) to zeros. Then, for each range (start, end) I iterate over all coordinates i in (start,end). I take min(i-start,end-i) (i.e. the distance to the nearest end of the range) and update results(i) with it if this val is larger.
The idea is that for each coordinate I only care a bout the 'largest' range that contains this coordinate -- largest in the sense its nearest end is most distant.
This works, but it means I actually do some 25k*5k=125M 'operations' (for each range, I go over all the coordinates in it).
I usually wouldn't care, but it takes almost half an hour (with some loading of data and writing, but I have no doubt the multiple scanning takes most of the time). Since I need to this for some 1000 times (simulations), this becomes a problem.
Any idea of making this process more efficient?
I would try to upload some code later today, but I have to work on it a bit so it could become a working example for you to play with. In the meantime, if you have any conceptual ideas - I'd love to hear them.
Thank you
Dave
UPDATE
Sorry for the delay, but I tried to simplify things as much as I could to get the simplest working example. Here it is:
use 5.012; use List::Util qw (min max); my $max_length = 10; my @ranges = ( [ 1, 3 ], [ 2, 7 ], [ 10, 2 ] ); # foreach $i in 1 .. $max_length: $res_a->[$i] == half size of # smallest odd-sized window which is centered at $i and not covered # by any range my $res_a = ranges_a_to_min_uncovered_a(\@ranges); foreach my $i (1 .. $max_length) { print $res_a->[$i]*2+1, " "; # print full size # just to be consistent with the orig +inal post # prints '5 5 5 7 7 5 3 1 1 3' } # gets an array of ranges # returns a ref to an arr hich satisfies: # foreach $i in 1 .. $max_length $ret->[$i] == # half size of smallest odd-sized window which is centered at $i and n +ot covered # by any range sub ranges_a_to_min_uncovered_a { my $ranges_a = shift // die; # default value is zero - a coordiante is not covered by any range # arrat length is $max_length+1 since the first position (0) is ne +ver used # (coordinates start from 1) my @min_uncovered_window_half_sizes = (0) x ( $max_length + 1 ); foreach my $range ( @{$ranges_a} ) { my $start = $range->[0]; my $end = $range->[1]; # iterate all coordinates in range if ( $start <= $end ) { # simple range for my $i ( $start .. $end ) { ranges_a_to_min_uncovered_a_helper( $i, $range, \@min_ +uncovered_window_half_sizes ); } } else { # wrapped range for my $i ( $start .. $max_length, 1 .. $end ) { ranges_a_to_min_uncovered_a_helper( $i, $range, \@min_ +uncovered_window_half_sizes ); } } } # foreach my $range return \@min_uncovered_window_half_sizes; } # gets a range, a coordinate within it and $min_uncovered_a # updates $min_uncovered_a[$coordinate] if needed sub ranges_a_to_min_uncovered_a_helper { my $coordinate = shift // die; my $range = shift // die; my $min_uncovered_a = shift // die; my $dist = min( dist( $range->[0], $coordinate ), dist( $coordinat +e, $range->[1] ) ); if ( $dist > $min_uncovered_a->[$coordinate] ) { $min_uncovered_a->[$coordinate] = $dist; } } # returns the length covered between $start and $end (both inclusive) sub dist { my $start = shift // die; my $end = shift // die; if ( $end >= $start ) { # simple range return $end - $start + 1; } # warpped range return $max_length - $start + 1 + $end; }
I also updated the title of this post so it'll be a bit less mysterious...
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |