Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Range overlap (with a lot of ranges)

by salva (Canon)
on May 24, 2010 at 14:26 UTC ( [id://841393]=note: print w/replies, xml ) Need Help??


in reply to Range overlap (with a lot of ranges)

You can look up all the test points in parallel.

Some code showing how to do it for a simplified version of your problem where the ranges cover all the domain (there are no gaps):

#!/usr/bin/perl use strict; use warnings; my @ranges = 0; push @ranges, $ranges[-1] + 1 + int rand 200 for 1..10000; my @tests = map int rand $ranges[-1], 0..1000000; match (\@ranges, \@tests); sub div { my ($border, $tests) = @_; my ($lt, $ge) = ([], []); push @{$_ < $border ? $lt : $ge}, $_ for @$tests; ($lt, $ge); } sub match { my ($ranges, $tests) = @_; if (@$ranges == 1) { if (@$tests) { print "tests in range $ranges->[0]:\n", join(", ", @$tests +), "\n"; } else { print "range $ranges->[0] is empty\n"; } } else { my $pivot = int((@$ranges + 1)/ 2); my ($lt, $ge) = div($ranges->[$pivot], $tests); match([@{$ranges}[0..$pivot-1]], $lt); match([@{$ranges}[$pivot..$#$ranges]], $ge); } }
And a version optimized to not copy data around but to use indexes inside the main arrays:
#!/usr/bin/perl use strict; use warnings; my @ranges = 0; push @ranges, $ranges[-1] + 1 + int rand 200 for 1..1_000_000; my @tests = map int rand $ranges[-1], 0..10_000_000; match(0, scalar @ranges, 0, scalar @tests); sub div { my ($border, $start, $end) = @_; while ($end > $start) { if ($tests[$start] < $border) { $start++; } else { if (--$end > $start) { @tests[$start, $end] = @tests[$end, $start]; } } } $end; } sub match { my ($r_start, $r_end, $t_start, $t_end) = @_; if ($r_end - $r_start <= 1) { if ($t_end - $t_start > 0) { print "tests in range $ranges[$r_start]:\n", join(", ", @t +ests[$t_start..$t_end-1]), "\n"; } else { print "range $ranges[$r_start] is empty\n"; } } else { my $r_pivot = int(($r_start + $r_end)/ 2); my $t_pivot = div($ranges[$r_pivot], $t_start, $t_end); match($r_start, $r_pivot - 1, $t_start, $t_pivot - 1); match($r_pivot, $r_end, $t_pivot, $t_end); } }
This algorithm, for N ranges and M test points where M > N has complexity O(MlogN) and does not require extra memory besides that used to store the ranges and the test points.

On my not so new computer, with the print sentences commented out, it runs in a couple of minutes.

Replies are listed 'Best First'.
Re^2: Range overlap (with a lot of ranges)
by BioLion (Curate) on May 24, 2010 at 15:40 UTC

    I'll have to spend some time looking through this, but it looks like another very viable option. Thanks for your input!

    Just a something something...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://841393]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-23 21:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found