in reply to Re^2: Proper usage of rindex function?
in thread Proper usage of rindex function?

See perlrequick.

Your regexp is capturing into $1 just one character, since you are using a character class. It is looking for:

[ATCG]+ # one or more of any of A,T,C,G # followed by ([ATCG]) # exactly one of A,T,C,G # which is captured into $1 # followed by \.+ # one or more dots # followed by $ # the end of the string
So, ignoring the fact that you have two variable $last_char and $last_mapped_char, it makes sense that you get a result, since you will have captured the last, last occurrence with your regexp, because it's greedy and the first part of it will eat up all the letters except for the last one. It's probably not how you want to code it, though.

Try running some tests:

perl -Mstrict -wE ' my $str = "ATCGATCG..."; if($str=~/[ATCG]+([ATCG])\.+$/) { my $last_char=$1; say $last_char; my $pos = rindex($str, $last_char); say $pos; } ' G 7
perl -Mstrict -wE ' my $str = "ATCGATCGA..."; if($str=~/[ATCG]+([ATCG])\.+$/) { my $last_char=$1; say $last_char; my $pos = rindex($str, $last_char); say $pos; } ' A 8

Hope this helps!



The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^4: Proper usage of rindex function?
by Anonymous Monk on Dec 29, 2017 at 12:42 UTC
    Yes, sorry, I copied-pasted it from the actual code, hence the two different variable names...
    I changed it a bit:
    if($str=~/([ATCG])\.+$/) { $last_mapped_char=$1; $rightmost_position_of_letter = rindex($str, $last_mapped_char)+1; }

    and now I think it is more sensible (at least to me). The problem I would have with the solution kindly provided above is that I would always need to compare the 4 values and find the largest one (because I only care about the last found A,T,G or C (does not matter which), before the dots start.

      If that is your spec, then you should use the special array @- to report the position of the match. You don't need rindex for this (although it is possible that using rindex would be faster, even with the additional step of comparing positions, especially with a long string):

      perl -Mstrict -wE ' my $str = "ATCGATCGA..."; if ( $str =~ /([ATCG])\./ ) { say "$1 at $-[1]"; } '
      Output:
      A at 8
      See Variables related to regular expressions in perlvar.

      Hope this helps!

      update: simplified the re


      The way forward always starts with a minimal test.
        Great! Thank you very much!
      That assumes the string ends in a dot; it won't find a letter at the final position. Also, the backtracking on the plus modifier is going to kill performance on long strings. You might want to use something like reverse($str) =~ /[ATCG]/.

      A regex solution might be fairly efficient, since you are looking for a simple word boundary. But I'd suggest using tr+index if making a copy of the $str is not a problem. That might be the fastest as well.

      # sub via_rx { $_[0] =~ /.*\b(?<=[ATCG])/s; $+[0] } sub via_rx { $_[0] =~ /.*\w\b/s; $+[0] } sub via_tr { 1+rindex(($_[0] =~ tr/ATCG/X/r), 'X') }