Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I basically have a long string of letters and I want to extract all letters between certain start and stop positions.

This works for me fine until the positions overlap (e.g. until the next start position is before the last stop position). There are up to three seperate start and stops per sequence as shown below.

Please can someone show me where i'm going wrong or suggest how to do this? My code is below, it fails to get things that overlap... Thanks!

foreach my $value (@data) { # GET THE SEQUENCE FOR EACH $VALUE # STARTS AND STOPS ALSO DECLARED UP HERE # E.G. START1 = 1; START2 = 192; START3 = 600 # STOP1 = 280; STOP2 = 433; STOP3 = 753 # Want to extract all letters between 1-280, 192-433 and 600-757 w +here 280 and 192 overlap # $sequence is a string with about 1000 letters @seq = $sequence; my $s = join ('', @seq); @seq = split ('', $s); for (my $i=1; $i<=@seq; $i++) { if (($i >= $start1) && ($i <= $stop1)) { push @current_seq, $seq[$i-1]; } if ($i == $stop1) { push @current_seq, "\n\n"; } if (($i >= $start2) && ($i <= $stop2)) { push @current_seq, $seq[$i-1]; } if ($i == $stop2) { push @current_seq, "\n\n"; } if (($i >= $start3) && ($i <= $stop3)) { push @current_seq, $seq[$i-1]; } if ($i == $stop3) { push @current_seq, "\n\n"; } } print "Sub-strings are:\n@current_seq\n";

Replies are listed 'Best First'.
Re: extracting substrings
by Eimi Metamorphoumai (Deacon) on Jan 12, 2005 at 20:45 UTC
    I guess a real question is what do you want to do when they overlap, since you're storing them all into the same array. That said, it seems that you would do a lot better to use substr instead of going through the letters one at a time. Also, instead of using separate $start1, $start2, etc, I'd suggest using an array of start and stop positions.
    for $i (0..$#start){ $seq[$i] = substr($sequence, $start[$i], $stop[$i]); }
Re: extracting substrings
by perlsen (Chaplain) on Jan 13, 2005 at 07:08 UTC

    If u wish you can try this code

    $sequence=' 1 sen 192 gube 433 siva raj 280 shank prasad 600 san zia 753'; $START1 = 1; $STOP1 = 280; $START2 = 192; $STOP2 = 433; $START3 = 600; $STOP3 = 753; for $k(1..3) { $s='START'."$k"; $en='STOP'."$k"; print "******start****"; (@arr)=$sequence=~m#$$s(.*?)$$en#gsi; $ar = join(" ", @arr); $ar=~s/\d//gi; $ar=~s/\n+/\n/gi; print "$ar"; print "******end***\n"; }


    output:
    ******start****
    sen
    gube
    siva
    raj
    ******end***
    ******start****
    gube
    ******end***
    ******start****
    san
    zia
    ******end***