shu_uemura has asked for the wisdom of the Perl Monks concerning the following question:

I have a question about replacing specific characters in a string based on their position in the string.

For example, I have

atcgcgtacatcgatac

and I want to make the ninth character from the left upper-case instead of lower-case, i.e., it should look like

atcgcgtaCatcgatac

after I am done. Furthermore, I have many sequences and various positions for each sequence where the casing of letters needs to be changed. How might I do this?

P.S. I have tried substr, but I can't fit my parameters into the arguments.

  • Comment on String character replacement based on position

Replies are listed 'Best First'.
Re: String character replacement based on position
by Marshall (Canon) on Feb 23, 2011 at 05:34 UTC
    Is this what you need? You can use substr on the left of the equals.
    #!/usr/bin/perl -w use strict; my $str = 'atcgcgtacatcgatac'; substr($str,8,1)= uc (substr($str,8,1)); #substr($str,-9,1)= uc (substr($str,-9,1)); #coincidence that this is +same print $str; #prints atcgcgtaCatcgatac
    update: again... I mis-read the direction the first time. Ikegami got it right the first time. The example works out the same whether counting from the left or right. But yes, use negative numbers to count from the right. (-1) is the last character on the right, use positive numbers to count from the left and (0) is the first character from the left, (NOT 1). The "trick" here is using the substr() on the left of the equals.

    substr EXPR,OFFSET,LENGTH
    If you need more than one character, adjust the LENGTH. BrowserUk's solution is fine also. I suspect the straightforward substr() solution is faster, because the right-hand side substr() does not modify the string, just returns the character specified and uc() is a very fast critter. If in doubt, benchmark.

      For case modifications, tr/// is useful as it operates in place:

      $s = 'atcgcgtacatcgatac';; substr( $s, $_, 1 ) =~ tr[acgt][ACGT] for 3,6,9,12,15;; print $s;; atcGcgTacAtcGatAc

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      That's the 9th character from the right. The 9th character from the right is at index 8.
      substr($str,8,1) = uc(substr($str,8,1));
      or
      $_ = uc for substr($str,8,1);

      Update: Probably should ignore the latter. Harder to read, surely slower and no real benefit.

      Yes! That's exactly what I needed. Thanks so much.
Re: String character replacement based on position
by jwkrahn (Abbot) on Feb 23, 2011 at 08:21 UTC
    $ perl -le' $_ = q/atcgcgtacatcgatac/; print; s/(?<=.{8})(.)/\u$1/s; print; ' atcgcgtacatcgatac atcgcgtaCatcgatac

      Or, using pos() and \G

      $ perl -le' $_ = q/atcgcgtacatcgatac/; print; pos = 8; s/\G(.)/\u$1/; print; ' atcgcgtacatcgatac atcgcgtaCatcgatac
Re: String character replacement based on position
by davido (Cardinal) on Feb 23, 2011 at 05:38 UTC

    Since I'm not much of an expert in genome problems (which is what you're working on, right?) I'm going to have to say that I need more information to understand where you're having difficulty. Could you show a few example rules that are giving you trouble in implementing? substr is a pretty useful tool for string manipulation that must occur in absolute positions. Some more complex manipulations based on in-string triggers may benefit from regexp implementations. But (I can't speak for everyone)... I need more info to know where the problem is. The case of converting the ninth character from the left of a string to upper case is probably not the problem you're having trouble solving, is it?


    Dave

      @"The case of converting the ninth character from the left of a string to upper case is probably not the problem you're having trouble solving, is it?"

      Yes, the problem I'm having is how to develop a general solution for changing the case of specific letters in a DNA sequence. So, I have many variable sequence positions where the letter case needs to be changed. Furthermore, letter case changes must sometimes occur more than once per DNA sequence.
        ... more than once per DNA sequence.

        Maybe:

        >perl -wMstrict -le "my $seq = 'atcgcgtacatcgatac'; my $rule = '01234567890123456'; my @uc_offsets = (0, 8, 15); ;; print qq{$rule}; print qq{$seq}; for my $offset (@uc_offsets) { substr($seq, $offset, 1) = uc substr $seq, $offset, 1; } print qq{$seq}; " 01234567890123456 atcgcgtacatcgatac AtcgcgtaCatcgatAc

        Easy to put that into a function.

        Update: Looking back, I see that BrowserUk essentially gave this solution already, so... Never mind.

Re: String character replacement based on position
by rustic (Monk) on Feb 23, 2011 at 13:18 UTC

    Hello, here the same thing with another use of substr() - the 4th argument

    $_ = substr $str, 8, 1; substr $str, 8, 1, uc; print "$str\n";