in reply to Re: Overlapping portions of sub strings
in thread Overlapping portions of sub strings

This is quite hard to understand. If you have time, could you please explain this process.
  • Comment on Re: Re: Overlapping portions of sub strings

Replies are listed 'Best First'.
Re: Re: Re: Overlapping portions of sub strings
by BrowserUk (Patriarch) on Jan 16, 2003 at 12:15 UTC

    I've attempted to structure this so that what gets displayed when you use the d/l code link at the bottom, will be a working program, slightly modified from the original. I won't know if it worked until I submit it.

    Hopefully, this is a slightly clearer version than the original.

    #! perl -slw use strict;

    OK, the basic mechanism is this:

    Given the input ranges

    my @ranges = ( [17, 19], [34, 39], [26, 29], [53, 57], [43, 47], [58, 59], [40, 4 +5], [30, 33], [20, 24], [10, 15], [ 6, 9], [ 1, 4], [35, 45], [ 7, 1 +5], );

    If we sort these numerically and lay them out graphically, we can easily project the 1's down vertically and see the resultant ranges on the bottom line. That's exactly what my code is doing.

    =pod comment 1 2 3 4 5 6 Range 123456789012345678901234567890123456789012345678901234567890 [ 1, 4] 1111 [ 6, 9] 1111 [ 7,15] 111111111 [10,15] 111111 [17,19] 111 [20,24] 11111 [26,29] 1111 [30,33] 1111 [34,39] 111111 [35,45] 11111111111 [40,45] 111111 [43,47] 11111 [53,57] 11111 [58,59] 11 1111 1111111111 11111111 1111111111111111111111 1111111 Coaleced ranges [1-4] [6-15] [17-24] [26-47] [53-59] #! initialise a string of null as long as the string the ranges perta +in to. my $coal = "\0" x 100; #! Only need be 60 in ths example but the exces +s bytes don't effect the results. Using substr as an lvalue, assign a string of range[high]-range[low] x + '1' over the null bytes if the result string. #! fill all the ranges with 1's substr($coal, $_->[0], $_->[1] - $_->[0] + 1) = 1 x ($_->[1] - $_->[0] + + 1) for @ranges; =cut

    That might have been clearer written as

    #! Only need be 60 in this example but excess bytes don't effect the r +esults. my $result = "\0" x 100; use constant LOW =>0; use constant HIGH =>1; for my $range (@ranges) { substr( $result, $range->[LOW], $range->[HIGH] - $range->[LOW] +1 +) = '1' x ($range->[HIGH] - $range->[LOW] +1 +) }
    Then, it's just a matter of scanning allong the string linearly and recording the starts and stops of the contguous runs of '1''s.
    #! Build the new AoA of ranges by scanning the string records starts a +nd length of string of 1's my (@results, $start); for (0 .. length $result) { #! Looking at each character in the result + string in turn my $c = substr($result, $_, 1); #! Grab the char #! skip to the next char until we find a 1. next if not $start and $c ne '1'; #! Record the start position if we found a 1 and we haven't alread +y got a start. $start = $_ if $c eq '1' and not $start; #! Skip to the next char if we have start pos until we find the en +d (ne '1') next if $start and $c eq '1'; #! We have a start and we found a non-'1', so we have an end... #! so push the start end pair onto the results array push @results, [$start, $_-1]; #! and undef $start to continue the search. $start = undef; } #! Print out the inputs sorted for human consumption, though that isn' +t necessary of the algorithm print map{local $"='-'; "[@$_] "} sort{ $a->[0] <=> $b->[0] } @ranges; + #!" print ''; #! And the results print map{local $"='-'; "[@$_] "}@results; #!" print ''; __END__ c:\test>227155-3 [1-4] [6-9] [7-15] [10-15] [17-19] [20-24] [26-29] [30-33] [34-39] [35 +-45] [40-45] [43-47] [53-57] [58-59] [1-4] [6-15] [17-24] [26-47] [53-59] c:\test>

    Note: If your string are of really extreme length and space at a premium, you could use a bit-string instead of bytes ("\0" and "1"), by using vec inplace of substr but I don't get on with vec, and my test string is short.

    BTW. If you don't use one yet, the code is a lot clearer to read in a syntax highlighting editor than b&w. Though that's a matter of taste and configuration.


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Re: Re: Overlapping portions of sub strings
by matth (Monk) on Jan 16, 2003 at 11:36 UTC
    Sorry. The message above is addressed to BrowserUk.