Re: Finding Start/End Position of the Uppercase Substring

To cope with the variable leading hyphens and the counting from 1 rather than 0 I decided to substitute zero or more hyphens at the beginning of the string with a single underscore to get the position as the OP wanted. I also used look arounds and regex code blocks. This caused me problems until I realised that the code blocks had created closures around $str, $startPos and $endPos when they were lexical. Declaring them with local our got things working.

use strict;
use warnings;

my @strings = qw{
   ccaatTTTGACACACACAGAAgggca
   --aatTTTGACACACACAGAAgggca
   --aatTTTGACACACACAGAA
   ---aagctaagattca
   TTTGACACACACAGAAgggca
   ---TTTGACACACACAGAAgggca
   };

foreach my $string ( @strings )
{
    my ($sp, $ep) = ucRange($string);
    print
       qq{        String - $string\n},
       qq{Start position - $sp\n},
       qq{  End position - $ep\n\n};
}

sub ucRange
{
    local our $str = shift;
    $str =~ s{\A-*}{_};

    local our $startPos = 0;
    $str =~ m{(?<=[a-z_])(?=[A-Z])(?{$startPos = pos $str})};

    local our $endPos = 0;
    $str =~ m{(?<=[A-Z])(?=[A-Z](?:[a-z]|\z))(?{$endPos = pos $str})};

    return ($startPos, $endPos);
}
[download]

The output.

        String - ccaatTTTGACACACACAGAAgggca
Start position - 6
  End position - 21

        String - --aatTTTGACACACACAGAAgggca
Start position - 4
  End position - 19

        String - --aatTTTGACACACACAGAA
Start position - 4
  End position - 19

        String - ---aagctaagattca
Start position - 0
  End position - 0

        String - TTTGACACACACAGAAgggca
Start position - 1
  End position - 16

        String - ---TTTGACACACACAGAAgggca
Start position - 1
  End position - 16
[download]

I hope this is of interest.

Cheers,

JohnGG

Update: Added string with no uppercase to check that script handled that.

Comment on Re: Finding Start/End Position of the Uppercase Substring Select or Download Code