monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I have a following problem. Given a string [ATCGN], where series of N is considered as "gap", I would like to obtain the last position of each gap segment. Now the position is in reverse form, see example below:

Example 1:
109876543210 GNNTCGANNTT * * return value 2 and 8
Example 2:
109876543210 GAATCGNNNTT * return value 2
Example 3:
109876543210 GANNCGNNNNN * * return value 0 and 7
Can any body advice how can I go about this? I am stuck with my code below:
my $str = "GNNTCGANNTT"; my @char_arr = split (//,$str); my $slen = length $str; foreach my $id (0 .. $#char) { if ($str[$id] =~ /N/gi) { # not sure how to go from here } }
And my hopeless approach above also seem very slow. I need to run these process in a speedy manner, as they are many such strings need to be processed.

Regards,
Edward

Replies are listed 'Best First'.
Re: Identifying Reverse Position of Specific Character in a String
by GrandFather (Saint) on Aug 25, 2006 at 19:45 UTC

    The trick is to reverse the string being searched:

    use strict; use warnings; my @strs = qw(GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN); for my $str (@strs) { my $rstr = reverse $str; my @starts; push @starts, $-[1] while $rstr =~ /(N+)/g; if (1 == @starts) { print "$str: One N found at @starts\n"; } elsif (@starts) { print "$str: Ns found at @starts\n"; } else { print "No N found in $str\n"; } }

    Prints:

    GNNTCGANNTT: Ns found at 2 8 GAATCGNNNTT: One N found at 2 GANNCGNNNNN: Ns found at 0 7

    Note too that the special array @- is used to get the index of the first match for each run of Ns.


    DWIM is Perl's answer to Gödel
Re: Identifying Reverse Position of Specific Character in a String
by jdporter (Paladin) on Aug 25, 2006 at 19:13 UTC
    sub find_gap_ends { local $_ = shift; my @gap_end_pos; while ( /N(?!N)/g ) { push @gap_end_pos, pos; } @gap_end_pos }
    We're building the house of the future together.
      Use for (shift) instead of local $_ = shift to avoid bugs.

        There's no benefit to that in this case, since the function makes no attempt to alter $_.

        Furthermore, using the for(shift) idiom with code that does modify $_ is dangerous, unless your intent is to modify $_[0].

        As for pos, one of the lessons you do (or should) learn early in your Perl edjication is to call pos as soon after the regex that sets it as possible — ideally, immediately after it. Otherwise, you're taking a gamble.

        We're building the house of the future together.
Re: Identifying Reverse Position of Specific Character in a String
by jwkrahn (Abbot) on Aug 25, 2006 at 20:11 UTC
    $ perl -le' @strings = qw/ GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN /; for ( @strings ) { push @results, []; unshift @{ $results[ -1 ] }, length() - pos() while /N+/g; } print "@$_" for @results; ' 2 8 2 0 7
Re: Identifying Reverse Position of Specific Character in a String
by chargrill (Parson) on Aug 25, 2006 at 22:21 UTC

    If you want a regex-only solution:

    #!/usr/bin/perl use strict; use warnings; my @strings = qw( GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN ); my @found; for my $string( @strings ){ my $revstring = reverse $string; $revstring =~ m< (?: .*? (?{ [ $^R ? @{ $^R } : () , pos ] }) NN+ )+ (?{ @found = @{ $^R }; }) >x; print "String: $string Positions: ", join( ', ', @found ), "\n"; }

    Prints:

    String: GNNTCGANNTT Positions: 2, 8 String: GAATCGNNNTT Positions: 2 String: GANNCGNNNNN Positions: 0, 7


    --chargrill
    $,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^ +$$C[$%++]}
Re: Identifying Reverse Position of Specific Character in a String
by roboticus (Chancellor) on Aug 25, 2006 at 19:24 UTC
    Edward--

    Perhaps this'll be of use?

    #!/usr/bin/perl -w use strict; use warnings; while (my $seq=<DATA>) { print $seq, ':'; my $cnt=0; my $prev=undef; my @arr = split /|/,$seq; while (my $c = shift @arr) { ++ $cnt; if ($c eq 'N') { $prev = $cnt; } else { print( $prev, ' ') if $prev; $prev = undef; } } print "\n"; } __DATA__ GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN

    On my machine, this gives me:

    roboticus@swill ~/PerlMonks $ ./RevGap.pl GNNTCGANNTT :3 9 GAATCGNNNTT :9 GANNCGNNNNN :4 11

    --roboticus