in reply to Re: Regular expressions
in thread Regular expressions

Also, I note a reference to codons, which implies that your tests should be considering a stride of 3 rather than an arbitrary position.

This is an excellent point. For the benefit of the OP, here is one way to ensure that only codon-sequences are captured:

#! perl use strict; use warnings; my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAACGAA'; # Adapted from the regex by stevieb my $re = qr{ ( # capture each sequence: ATG # - which begins with the codon ATG (?: [ACGT]{3} )*? # - followed by the smallest number of + codons (?: TAG | TAA | TGA ) # - and ending with the codon TAG, TAA +, or TGA ) }x; print "$1\n" while $seq =~ /$re/g;

(This assumes that only minimal sequences are wanted — an assumption which should be clarified, as Laurent_R has pointed out, above.)

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^3: Regular expressions
by AnomalousMonk (Archbishop) on Oct 27, 2015 at 07:27 UTC

    I would have organized the code slightly differently, factoring each of the pattern elements into a separate  qr// regex object and combining them together (inside a capture group) in the final  m// match:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $codon = qr{ [ACGT]{3} }xms; my $start = qr{ ATG }xms; my $end = qr{ TAG | TAA | TGA }xms; ;; my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAACGAA'; ;; print qq{'$1'} while $seq =~ m{ ($start $codon*? $end) }xmsg; " 'ATGGTTTCTCCCATCTCTCCATCGGCATAA' 'ATGATCTAA'
    Separate  qr// definitions ease maintenance and, if variable names be wisely chosen, are self-commenting. If possible, I only use capture groups in the final  m// match due to the confusion that trying to count nested capture groups can produce.


    Give a man a fish:  <%-{-{-{-<