Just one question: In the sequence 'atgaaaaa' (which is not terminated by any of (taa|tag|tga)), what should be matched? From the discussion in the thread so far, I assume the answer is 'nothing'.
With that assumption in hand, here's a small variation on BrowserUk's approach, which is easily adapted to capture all kinds of info about each match. This needs Perl version 5.10+ for ${^MATCH} and \K and the //p regex modifier. If only the matching sub-sequences are needed, it can capture directly to an array. Because it does not use capture groups, it may be slightly faster, but I have not Benchmark-ed this.
>perl -wMstrict -le "my $dna = 'atctcggataatgggataaaaatataggctataaatggcgccccggctaattttt'; ;; my @sub_seqs; push @sub_seqs, [ ${^MATCH}, $-[0] ] while $dna =~ m{ atg \K [acgt]+? (?= taa | tag | tga) }xmspg; ;; printf qq{%d sub-sequence(s) \n}, scalar @sub_seqs; ;; print $dna if @sub_seqs; for my $ar_sub_seq (@sub_seqs) { my $cursor = ('-' x $ar_sub_seq->[1]) . ('^' x length $ar_sub_seq->[0]); print $cursor; } ;; my @ss = $dna =~ m{ atg \K [acgt]+? (?= taa | tag | tga) }xmspg; printf qq{'$_' } for @ss; " 2 sub-sequence(s) atctcggataatgggataaaaatataggctataaatggcgccccggctaattttt -------------^^^ -------------------------------------^^^^^^^^^^ 'gga' 'gcgccccggc'
In reply to Re: Simple regex question. Grouping with a negative lookahead assertion.
by AnomalousMonk
in thread Simple regex question. Grouping with a negative lookahead assertion.
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |