in reply to Can I speed this up? (repetitively scanning ranges in a large array)

I used the following code to remove subranges of longer ranges (just 51 ranges survive from your corrected example):
my $length = 87688; my @ranges; while(<>){ chomp; my @range = split / /; push @ranges,[@range]; } # Using Schwartzian transform, sort the ranges from the # longest to the shortest @ranges = map { [$_->[1],$_->[2]] } sort { $b->[0] <=> $a->[0] } map { my $remainder = $_->[1] - $_->[0]; $remainder += $length if $remainder < 0; [$remainder,$_->[0],$_->[1]] } @ranges; # checks that a subrange is included in a wider range sub included { my($o0,$o1,$i0,$i1) = @_; if ($o0 <= $o1){ return ($i0 <= $i1 and $i0 >= $o0 and $i1 <= $o1) } else { # outer range is circular return ($i1 <= $o1 or $i0 >= $o0) } } # included my @ranges_kept; while (my $range = pop @ranges) { push @ranges_kept,$range unless grep included(@$_,@$range),@ranges; } warn scalar @ranges_kept," ranges kept.\n"; print $_->[0],' ',$_->[1],"\n" foreach (@ranges_kept);
It takes less then a second on my machine to prune the ranges.
  • Comment on Re: Can I speed this up? (repetitively scanning ranges in a large array)
  • Download Code

Replies are listed 'Best First'.
Re^2: Can I speed this up? (repetitively scanning ranges in a large array)
by daverave (Scribe) on Nov 02, 2010 at 18:24 UTC
    That's a good idea that would surely help. However, I would have to check how much effect it will have.

    The example I gave is very small - less than 1000 ranges compared to the usual 25k. Also, I'm not sure if it represents the normal proportion of "contained" ranges (I would guess it's much lower than the 95% in this specific case).

    So, I will surely incorporate it and try it out, but I'm not confident it will have a great effect in the average case.

    UPDATEIt seems after in-lining the original object methods, filtering out a ranges takes about the same time as processing it (perhaps even longer). I therefore stopped using this for the meantime, although this seemed like a very nice idea. Perhaps there's a more efficient way of filtering the ranges?