in reply to Re: Regular expression problems
in thread Regular expression problems

Thanks for the help. I used this as an example but I will be using it to find an ORF (A Start codon'M', followed by any of the rest of the Amino Acids until a Stop codon'_' I think that will be: =~/(M)[GAVLIFWPSTCYNQDEKRH]+(_)/ or =~/(M[GAVLIFWPSTCYNQDEKRH]+_)/ I'm not sure but I will try both. Thanks again.

Replies are listed 'Best First'.
Re^3: Regular expression problems
by brx (Pilgrim) on Apr 24, 2012 at 14:16 UTC
    You can want to add 'M' and '_' in parens to capture in one shot or concatanate 'M'.$1.'_' depending of what you want to do. One way to do it:
    #!/usr/bin/perl -w use strict; my $seqnum=1; while (my $seq=<DATA>) { chomp $seq; print "sequence #",$seqnum++,":\n"; while ($seq =~ /M([GAVLIFWPSTCYNQDEKRH]+)_/g) { print "\t",$1,"\n"; # or: print "\t","M${1}_","\n"; } } __DATA__ MHGRRRRRRRRRRRRRRRRRRRRRRRRRRRRRD_MHGRRRRRRRRD_ CMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVTECMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVT AWPPPPPPPPPPPPPPPPPPPPPPPPPPPPP_LNAWPPPPPPPPPPPPPPPPPPPPPPPPPPPPP_L FOOBARMNOTTHISONE_XYZMTHISYES_MTHISNOT_