Re^3: Proper usage of rindex function?

Your regexp is capturing into $1 just one character, since you are using a character class. It is looking for:

[ATCG]+  # one or more of any of A,T,C,G
         # followed by
([ATCG]) # exactly one of A,T,C,G
         # which is captured into $1
         # followed by
\.+      # one or more dots
         # followed by
$        # the end of the string
[download]

So, ignoring the fact that you have two variable $last_char and $last_mapped_char, it makes sense that you get a result, since you will have captured the last, last occurrence with your regexp, because it's greedy and the first part of it will eat up all the letters except for the last one. It's probably not how you want to code it, though.

Try running some tests:

perl -Mstrict -wE '
my $str = "ATCGATCG...";
if($str=~/[ATCG]+([ATCG])\.+$/) {
   my $last_char=$1;
   say $last_char;
    my $pos = rindex($str, $last_char);
   say $pos;
}
'
G
7
[download]

perl -Mstrict -wE '
my $str = "ATCGATCGA...";
if($str=~/[ATCG]+([ATCG])\.+$/) {
   my $last_char=$1;
   say $last_char;
    my $pos = rindex($str, $last_char);
   say $pos;
}
'
A
8
[download]

Hope this helps!

The way forward always starts with a minimal test.

Comment on Re^3: Proper usage of rindex function? Select or Download Code

Replies are listed 'Best First'.
Re^4: Proper usage of rindex function? by Anonymous Monk on Dec 29, 2017 at 12:42 UTC
Yes, sorry, I copied-pasted it from the actual code, hence the two different variable names... I changed it a bit: `if($str=~/([ATCG])\.+$/) { $last_mapped_char=$1; $rightmost_position_of_letter = rindex($str, $last_mapped_char)+1; }` [download] and now I think it is more sensible (at least to me). The problem I would have with the solution kindly provided above is that I would always need to compare the 4 values and find the largest one (because I only care about the last found A,T,G or C (does not matter which), before the dots start.	[reply] [d/l]
Re^5: Proper usage of rindex function? by 1nickt (Canon) on Dec 29, 2017 at 12:54 UTC
If that is your spec, then you should use the special array `@-` to report the position of the match. You don't need `rindex` for this (although it is possible that using `rindex` would be faster, even with the additional step of comparing positions, especially with a long string): `perl -Mstrict -wE ' my $str = "ATCGATCGA..."; if ( $str =~ /([ATCG])\./ ) { say "$1 at $-[1]"; } '` [download] Output: `A at 8` [download] See Variables related to regular expressions in perlvar. Hope this helps! update: simplified the re The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^6: Proper usage of rindex function? by Anonymous Monk on Dec 29, 2017 at 13:06 UTC
Great! Thank you very much!	[reply]
Re^5: Proper usage of rindex function? by Anonymous Monk on Dec 29, 2017 at 16:35 UTC
That assumes the string ends in a dot; it won't find a letter at the final position. Also, the backtracking on the plus modifier is going to kill performance on long strings. You might want to use something like `reverse($str) =~ /[ATCG]/`.	[reply]
Re^5: Proper usage of rindex function? by Anonymous Monk on Dec 29, 2017 at 21:33 UTC
A regex solution might be fairly efficient, since you are looking for a simple word boundary. But I'd suggest using tr+index if making a copy of the `$str` is not a problem. That might be the fastest as well. `# sub via_rx { $_[0] =~ /.\b(?<=[ATCG])/s; $+[0] } sub via_rx { $_[0] =~ /.\w\b/s; $+[0] } sub via_tr { 1+rindex(($_[0] =~ tr/ATCG/X/r), 'X') }` [download]	[reply] [d/l] [select]