I have a circular coordinate system. A legal coordinate is any integer in (1, max_length). For simplicity, we shall assume max_length=100.

A range is any pair of legal coordinates. It can be 'simple', like (5,20) which means 5,6,7,...,19,20, or 'wrapped' like (95,10) which means 95,96,..,99,100,1,2,3,...,9,10. Note that the coordinate system start from 1 (not 0) and that ranges are always close or inclusive (i.e. the start and end coordinate of a range are both considered in range).

I have a bunch of ranges. For each coordinate i in the coordinate system (i.e. foreach i in (1 .. max_length), I want to find the size of the smallest window centered around i, such that no range fully contains this window.

To make things clearer, let's suppose max_length = 10.

1234567890 coordinate ---------- +++ range (1,3) ++++++ range (2,7) ++ + range (0,2) ---------- 5557753113 size of smallest window...

For coordinate 1, if we look at a window of size 1 that is centered at 1, it is just (1,1). It is covered by a range (actually, two ranges), so we continue for a larger window size (let's only look at odd-sized windows). Let's look at a window of size 3 that is centered at 1. This is (0,2) and it's still covered. Now, we get to a window of size 5 which is (9,3). this is not covered by any range so the result for coordinate 1 is 5. We do the same for each coordinate. Note that for coordinate 8 we stop at the first size (1) since (8,8) is not covered by any range. The other values are displayed at the last line of the above code.

Now, I wish to calculate this last line quickly. the actual size of the coordinate system is around 4M and the number of ranges is about 25k, each of length 5k.

What I currently do is to initialize a results array of size max_length (i.e. 4M) to zeros. Then, for each range (start, end) I iterate over all coordinates i in (start,end). I take min(i-start,end-i) (i.e. the distance to the nearest end of the range) and update results(i) with it if this val is larger.

The idea is that for each coordinate I only care a bout the 'largest' range that contains this coordinate -- largest in the sense its nearest end is most distant.

This works, but it means I actually do some 25k*5k=125M 'operations' (for each range, I go over all the coordinates in it).

I usually wouldn't care, but it takes almost half an hour (with some loading of data and writing, but I have no doubt the multiple scanning takes most of the time). Since I need to this for some 1000 times (simulations), this becomes a problem.

Any idea of making this process more efficient?

I would try to upload some code later today, but I have to work on it a bit so it could become a working example for you to play with. In the meantime, if you have any conceptual ideas - I'd love to hear them.

Thank you

Dave

UPDATE

Sorry for the delay, but I tried to simplify things as much as I could to get the simplest working example. Here it is:

use 5.012; use List::Util qw (min max); my $max_length = 10; my @ranges = ( [ 1, 3 ], [ 2, 7 ], [ 10, 2 ] ); # foreach $i in 1 .. $max_length: $res_a->[$i] == half size of # smallest odd-sized window which is centered at $i and not covered # by any range my $res_a = ranges_a_to_min_uncovered_a(\@ranges); foreach my $i (1 .. $max_length) { print $res_a->[$i]*2+1, " "; # print full size # just to be consistent with the orig +inal post # prints '5 5 5 7 7 5 3 1 1 3' } # gets an array of ranges # returns a ref to an arr hich satisfies: # foreach $i in 1 .. $max_length $ret->[$i] == # half size of smallest odd-sized window which is centered at $i and n +ot covered # by any range sub ranges_a_to_min_uncovered_a { my $ranges_a = shift // die; # default value is zero - a coordiante is not covered by any range # arrat length is $max_length+1 since the first position (0) is ne +ver used # (coordinates start from 1) my @min_uncovered_window_half_sizes = (0) x ( $max_length + 1 ); foreach my $range ( @{$ranges_a} ) { my $start = $range->[0]; my $end = $range->[1]; # iterate all coordinates in range if ( $start <= $end ) { # simple range for my $i ( $start .. $end ) { ranges_a_to_min_uncovered_a_helper( $i, $range, \@min_ +uncovered_window_half_sizes ); } } else { # wrapped range for my $i ( $start .. $max_length, 1 .. $end ) { ranges_a_to_min_uncovered_a_helper( $i, $range, \@min_ +uncovered_window_half_sizes ); } } } # foreach my $range return \@min_uncovered_window_half_sizes; } # gets a range, a coordinate within it and $min_uncovered_a # updates $min_uncovered_a[$coordinate] if needed sub ranges_a_to_min_uncovered_a_helper { my $coordinate = shift // die; my $range = shift // die; my $min_uncovered_a = shift // die; my $dist = min( dist( $range->[0], $coordinate ), dist( $coordinat +e, $range->[1] ) ); if ( $dist > $min_uncovered_a->[$coordinate] ) { $min_uncovered_a->[$coordinate] = $dist; } } # returns the length covered between $start and $end (both inclusive) sub dist { my $start = shift // die; my $end = shift // die; if ( $end >= $start ) { # simple range return $end - $start + 1; } # warpped range return $max_length - $start + 1 + $end; }

I also updated the title of this post so it'll be a bit less mysterious...


In reply to Can I speed this up? (repetitively scanning ranges in a large array) by daverave

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.