in reply to Re: Re: Overlapping portions of sub strings
in thread Overlapping portions of sub strings
I've attempted to structure this so that what gets displayed when you use the d/l code link at the bottom, will be a working program, slightly modified from the original. I won't know if it worked until I submit it.
Hopefully, this is a slightly clearer version than the original.
#! perl -slw use strict;
OK, the basic mechanism is this:
Given the input ranges
my @ranges = ( [17, 19], [34, 39], [26, 29], [53, 57], [43, 47], [58, 59], [40, 4 +5], [30, 33], [20, 24], [10, 15], [ 6, 9], [ 1, 4], [35, 45], [ 7, 1 +5], );
If we sort these numerically and lay them out graphically, we can easily project the 1's down vertically and see the resultant ranges on the bottom line. That's exactly what my code is doing.
=pod comment 1 2 3 4 5 6 Range 123456789012345678901234567890123456789012345678901234567890 [ 1, 4] 1111 [ 6, 9] 1111 [ 7,15] 111111111 [10,15] 111111 [17,19] 111 [20,24] 11111 [26,29] 1111 [30,33] 1111 [34,39] 111111 [35,45] 11111111111 [40,45] 111111 [43,47] 11111 [53,57] 11111 [58,59] 11 1111 1111111111 11111111 1111111111111111111111 1111111 Coaleced ranges [1-4] [6-15] [17-24] [26-47] [53-59] #! initialise a string of null as long as the string the ranges perta +in to. my $coal = "\0" x 100; #! Only need be 60 in ths example but the exces +s bytes don't effect the results. Using substr as an lvalue, assign a string of range[high]-range[low] x + '1' over the null bytes if the result string. #! fill all the ranges with 1's substr($coal, $_->[0], $_->[1] - $_->[0] + 1) = 1 x ($_->[1] - $_->[0] + + 1) for @ranges; =cut
That might have been clearer written as
Then, it's just a matter of scanning allong the string linearly and recording the starts and stops of the contguous runs of '1''s.#! Only need be 60 in this example but excess bytes don't effect the r +esults. my $result = "\0" x 100; use constant LOW =>0; use constant HIGH =>1; for my $range (@ranges) { substr( $result, $range->[LOW], $range->[HIGH] - $range->[LOW] +1 +) = '1' x ($range->[HIGH] - $range->[LOW] +1 +) }
#! Build the new AoA of ranges by scanning the string records starts a +nd length of string of 1's my (@results, $start); for (0 .. length $result) { #! Looking at each character in the result + string in turn my $c = substr($result, $_, 1); #! Grab the char #! skip to the next char until we find a 1. next if not $start and $c ne '1'; #! Record the start position if we found a 1 and we haven't alread +y got a start. $start = $_ if $c eq '1' and not $start; #! Skip to the next char if we have start pos until we find the en +d (ne '1') next if $start and $c eq '1'; #! We have a start and we found a non-'1', so we have an end... #! so push the start end pair onto the results array push @results, [$start, $_-1]; #! and undef $start to continue the search. $start = undef; } #! Print out the inputs sorted for human consumption, though that isn' +t necessary of the algorithm print map{local $"='-'; "[@$_] "} sort{ $a->[0] <=> $b->[0] } @ranges; + #!" print ''; #! And the results print map{local $"='-'; "[@$_] "}@results; #!" print ''; __END__ c:\test>227155-3 [1-4] [6-9] [7-15] [10-15] [17-19] [20-24] [26-29] [30-33] [34-39] [35 +-45] [40-45] [43-47] [53-57] [58-59] [1-4] [6-15] [17-24] [26-47] [53-59] c:\test>
Note: If your string are of really extreme length and space at a premium, you could use a bit-string instead of bytes ("\0" and "1"), by using vec inplace of substr but I don't get on with vec, and my test string is short.
BTW. If you don't use one yet, the code is a lot clearer to read in a syntax highlighting editor than b&w. Though that's a matter of taste and configuration.
Examine what is said, not who speaks.
The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.
|
|---|