in reply to Overlapping portions of sub strings

You're talking about ranges, I suppose. I assume all colored ranges actually have the same color. If not, you'll have to work out a more elaborate scheme. (Instead of just colors, it's like having italic and bold ranges, and ranges that are both bold and italic.)

First, extract the start and end index for each range, two numbers (end >= start) for each range. Next, see if there is any overlap. That means that the start or end index of one range is inside, or touching an end, of another range, or vice versa. Actually, testing the reverse is easier: two ranges don't overlap, if the starting index of the second range comes after the end index of the first one.

Once you identified two adjoint or overlapping ranges, merge them: start is at the lowest starting index, end is at the highest end index. Be careful to restart all tests with this new range, as the new incarnation of this range now larger than it originally was, and it could overlap with ranges it didn't even touch before.

I'll try for some code, for size. First I order the ranges according to start index. Next I compare the start index for all ranges after the current one, to see if they overlap. If they do, I incorporate it into this one, delete the other one, and start all tests again for the current range. I think that should cover it.

Update: Don't use this code without checking out Thelonius' reply, first. He's right that there are some bugs. Out of historical interest, I won't fix them here.

my @range = ([35, 55], [50, 60], [30, 40], [60, 65], [70, 80]); @range = sort { $a->[0] <=> $b->[0] } @range; for (my $i = 0; $i < @range; $i++) { my $e = $range[$i][1]; for(my $j = $i+1; $j < @range; $j++) { unless($range[$j][0] > $e) { $e = $range[$i][1] = $range[$j][1]; splice @range, $j, 1; redo; } } } use Data::Dumper; print Dumper \@range;
Result:
$VAR1 = [ [ 30, 65 ], [ 70, 80 ] ];
which looks about right to me.

Replies are listed 'Best First'.
Re: Re: Overlapping portions of sub strings
by Thelonius (Priest) on Jan 15, 2003 at 20:57 UTC
    No, this code will loop infinitely on many inputs, e.g. my @range3 = ([0, 10], [5, 20]); To fix that, you need redo if $j < @range;

    But it is also buggy for contained intervals. The input my @range4 = ([0, 20], [5, 8]); will give the output interval ([0, 8]). Here is the complete correct code:

    for (my $i = 0; $i < @range; $i++) { my $e = $range[$i][1]; for(my $j = $i+1; $j < @range; $j++) { unless($range[$j][0] > $e) { if ($range[$j][1] > $e) { $e = $range[$i][1] = $range[$j][1]; } splice @range, $j, 1; redo if $j < @range; } } }
      Apparently you're right on both accounts. That of the nested range is a stupid mistake of mine, thinking that the end of the second range would always be the larger one because its start is. In fact, what I had in my head before I wrote anything down, was using max():
      use List::Util qw(max); $e = $range[$i][1] = max($e, $range[$j][1]);
      But in fact, your code is simpler, thus, better.

      But that of the infinite loop surprised me, a lot. I had assumed that the condition, the second expression in the for(;;) line, was being tested before any attempt at running the loop body, for each loop.

      But apparently, something goes wrong if you do a redo: