Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Season's greetings fellow monks!
I have a string in the format of:
$str='...............................GAGAACATTAGTGGGTGCAGCGCACAAGCATGG +CACATGTATACGTATGTAA..................';

which basically has only A,C,T,G and . inside.
What I want to do is find the rightmost position of the string that has a letter. I tried:
$rightmost_position_of_letter = rindex ($str,[A|T|C|G]);

and
$rightmost_position_of_letter = rindex ($str,[ATCG]);

but gives me -1. What am I doing wrong?

Replies are listed 'Best First'.
Re: Proper usage of rindex function?
by 1nickt (Canon) on Dec 29, 2017 at 12:19 UTC

    Hi, what makes you think that you can pass a list of substrings to rindex?

    If the substring is not found, index returns -1.

    Your code is trying to match the entire substring. You should check each character individually.

    perl -Mstrict -wE ' my $str = "GAGAACATTAGTGGGTGCAGCGCACAAGCATGGCACATGTATACGTATGTAA"; say sprintf q{%s is last found at pos %s}, $_, rindex( $str, $_ ) for +qw/A C G T/; '
    Output:
    A is last found at pos 51 C is last found at pos 43 G is last found at pos 48 T is last found at pos 49

    Also, always use strict; in your code: it will make Perl tell you about mistakes you've made:

    perl -Mstrict -wE ' my $str = "GAGAACATTAGTGGGTGCAGCGCACAAGCATGGCACATGTATACGTATGTAA"; say rindex( $str, [ATCG] ) '
    Output:
    Bareword "ATCG" not allowed while "strict subs" in use at -e line 3. Execution of -e aborted due to compilation errors.

    Hope this helps!


    The way forward always starts with a minimal test.
      Aha, I see, I thought it would work... I tried this:
      if($str=~/[ATCG]+([ATCG])\.+$/) { $last_char=$1; $rightmost_position_of_letter = rindex($str, $last_mapped_char); }

      and it seems to work. Does that make sense?

        See perlrequick.

        Your regexp is capturing into $1 just one character, since you are using a character class. It is looking for:

        [ATCG]+ # one or more of any of A,T,C,G # followed by ([ATCG]) # exactly one of A,T,C,G # which is captured into $1 # followed by \.+ # one or more dots # followed by $ # the end of the string
        So, ignoring the fact that you have two variable $last_char and $last_mapped_char, it makes sense that you get a result, since you will have captured the last, last occurrence with your regexp, because it's greedy and the first part of it will eat up all the letters except for the last one. It's probably not how you want to code it, though.

        Try running some tests:

        perl -Mstrict -wE ' my $str = "ATCGATCG..."; if($str=~/[ATCG]+([ATCG])\.+$/) { my $last_char=$1; say $last_char; my $pos = rindex($str, $last_char); say $pos; } ' G 7
        perl -Mstrict -wE ' my $str = "ATCGATCGA..."; if($str=~/[ATCG]+([ATCG])\.+$/) { my $last_char=$1; say $last_char; my $pos = rindex($str, $last_char); say $pos; } ' A 8

        Hope this helps!



        The way forward always starts with a minimal test.