Ekolet has asked for the wisdom of the Perl Monks concerning the following question:

while($text =~ m/(TATAAT|TTGACAT)\w+[^ATG]?(ATG\w+)[^TAG|TAA|TGA]?/g) { $2=~tr/ATGC/UACG/; print $2;

Hello, I just started to learn perl and just discovered this forum. Anyways what I am trying to do is find genes on a DNA strand and translate them into RNA. (for non biology speakers I am trying to parse a part of the regex, and translate all A's into U's etc...) but when I try to modify $2 parse it tells me that "modification of a read only value at c://blabla.... Any ideas how can I do this? thanks in advance.

Replies are listed 'Best First'.
Re: Modification of read only values.
by JavaFan (Canon) on Mar 17, 2012 at 18:39 UTC
    You cannot modify number variables. Make a copy first:
    while ($text =~ m/(TATAAT|TTGACAT)\w+[^ATG]?(ATG\w+)[^TAG|TAA|TGA]?/g) + { my $match = $2; $match =~ tr/ATGC/UACG/; print $match; }
Re: Modification of read only values.
by AnomalousMonk (Archbishop) on Mar 17, 2012 at 19:09 UTC

    In addition, the inverted character set  [^TAG|TAA|TGA] in the OP regex does not do what I think you think it does. The | (pipe) character has no special meaning in a character set, nor does repetition of a character have significance. The set above is equivalent to the  [^|TAG] set.

    My guess about what you originally intended is something along the lines of "not followed by any of the sub-sequences TAG, TAA or TGA". This can be achieved by the negative look-ahead assertion
        (?! TAG | TAA | TGA)
    (assuming use of the /x regex modifier).

    However, the original expression was  [^TAG|TAA|TGA]? (or its equivalent [^|TAG]?) — note the ? quantifier — meaning "Some character other than  | T A G must be present — or not. Whatever." The ? quantifier makes the whole thing optional either in its character set form or as a negative look-ahead.

    On The Other Hand, the OP m// regex used the /g modifier, so the intent may have been something like "step over any three bases other than a TAG, TAA or TGA sub-sequence, if present", which could be achieved by
        (?: (?! TAG | TAA | TGA) ...)?
    (again, with the /x modifier), but this is reading a lot into Ekolet's OP.

    Update: Furthermore, the  \w+[^ATG]? sub-expression is suspect. I don't understand the intention of this one either if one assumes matching against a string consisting only of A T C G characters or, more generally, only \w characters; in either case, the  \w+ will 'consume' anything that might be matched by the optional  [^ATG]? which is thus rendered effectless.

Re: Modification of read only values.
by CountZero (Bishop) on Mar 17, 2012 at 18:39 UTC
    Add a my $gene = $2 at the top of your loop and work with $gene from thereon.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics