Identifying Reverse Position of Specific Character in a String

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I have a following problem. Given a string [ATCGN], where series of N is considered as "gap", I would like to obtain the last position of each gap segment. Now the position is in reverse form, see example below:

Example 1:

   109876543210
    GNNTCGANNTT
      *     *

return value 2 and 8
[download]

Example 2:

   109876543210
    GAATCGNNNTT
            *

return value 2
[download]

Example 3:

   109876543210
    GANNCGNNNNN
       *      *

return value 0 and 7
[download]

Can any body advice how can I go about this? I am stuck with my code below:

my $str = "GNNTCGANNTT";
my @char_arr = split (//,$str);
my $slen = length $str;

foreach my $id (0 .. $#char) {
   if ($str[$id] =~ /N/gi) {
     # not sure how to go from here
    }
}
[download]

And my hopeless approach above also seem very slow. I need to run these process in a speedy manner, as they are many such strings need to be processed.

Regards,
Edward

Comment on Identifying Reverse Position of Specific Character in a String Select or Download Code

Replies are listed 'Best First'.
Re: Identifying Reverse Position of Specific Character in a String by GrandFather (Saint) on Aug 25, 2006 at 19:45 UTC
The trick is to reverse the string being searched: `use strict; use warnings; my @strs = qw(GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN); for my $str (@strs) { my $rstr = reverse $str; my @starts; push @starts, $-[1] while $rstr =~ /(N+)/g; if (1 == @starts) { print "$str: One N found at @starts\n"; } elsif (@starts) { print "$str: Ns found at @starts\n"; } else { print "No N found in $str\n"; } }` [download] Prints: `GNNTCGANNTT: Ns found at 2 8 GAATCGNNNTT: One N found at 2 GANNCGNNNNN: Ns found at 0 7` [download] Note too that the special array @- is used to get the index of the first match for each run of Ns. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Identifying Reverse Position of Specific Character in a String by jdporter (Paladin) on Aug 25, 2006 at 19:13 UTC
`sub find_gap_ends { local $_ = shift; my @gap_end_pos; while ( /N(?!N)/g ) { push @gap_end_pos, pos; } @gap_end_pos }` [download] We're building the house of the future together.	[reply] [d/l]
Re^2: Identifying Reverse Position of Specific Character in a String by ikegami (Patriarch) on Aug 25, 2006 at 19:31 UTC
Use `for (shift)` instead of `local $_ = shift` to avoid bugs.	[reply] [d/l] [select]
Re^3: Identifying Reverse Position of Specific Character in a String by jdporter (Paladin) on Aug 25, 2006 at 19:56 UTC
There's no benefit to that in this case, since the function makes no attempt to alter `$_`. Furthermore, using the `for(shift)` idiom with code that does modify `$_` is dangerous, unless your intent is to modify `$_[0]`. As for `pos`, one of the lessons you do (or should) learn early in your Perl edjication is to call `pos` as soon after the regex that sets it as possible — ideally, immediately after it. Otherwise, you're taking a gamble. We're building the house of the future together.	[reply] [d/l]
Re^4: Identifying Reverse Position of Specific Character in a String by ikegami (Patriarch) on Aug 25, 2006 at 20:02 UTC
Re^5: Identifying Reverse Position of Specific Character in a String by jdporter (Paladin) on Aug 25, 2006 at 20:18 UTC
Re: Identifying Reverse Position of Specific Character in a String by jwkrahn (Abbot) on Aug 25, 2006 at 20:11 UTC
`$ perl -le' @strings = qw/ GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN /; for ( @strings ) { push @results, []; unshift @{ $results[ -1 ] }, length() - pos() while /N+/g; } print "@$_" for @results; ' 2 8 2 0 7` [download]	[reply] [d/l]
Re: Identifying Reverse Position of Specific Character in a String by chargrill (Parson) on Aug 25, 2006 at 22:21 UTC
If you want a regex-only solution: `#!/usr/bin/perl use strict; use warnings; my @strings = qw( GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN ); my @found; for my $string( @strings ){ my $revstring = reverse $string; $revstring =~ m< (?: .? (?{ [ $^R ? @{ $^R } : () , pos ] }) NN+ )+ (?{ @found = @{ $^R }; }) >x; print "String: $string Positions: ", join( ', ', @found ), "\n"; }` [download] Prints: `String: GNNTCGANNTT Positions: 2, 8 String: GAATCGNNNTT Positions: 2 String: GANNCGNNNNN Positions: 0, 7` [download] --chargrill `$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$\|;for(@$c){print$_^ +$$C[$%++]}` [download]	[reply] [d/l] [select]
Re: Identifying Reverse Position of Specific Character in a String by roboticus (Chancellor) on Aug 25, 2006 at 19:24 UTC
Edward-- Perhaps this'll be of use? `#!/usr/bin/perl -w use strict; use warnings; while (my $seq=<DATA>) { print $seq, ':'; my $cnt=0; my $prev=undef; my @arr = split /\|/,$seq; while (my $c = shift @arr) { ++ $cnt; if ($c eq 'N') { $prev = $cnt; } else { print( $prev, ' ') if $prev; $prev = undef; } } print "\n"; } __DATA__ GNNTCGANNTT GAATCGNNNTT GANNCGNNNNN` [download] On my machine, this gives me: `roboticus@swill ~/PerlMonks $ ./RevGap.pl GNNTCGANNTT :3 9 GAATCGNNNTT :9 GANNCGNNNNN :4 11` [download] --roboticus	[reply] [d/l] [select]