Try something like:

c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $seq = 'xABCxABCxxxWXYxWXZxxxABCxxWXYx'; ;; my $subseq = qr{ ABC \w* (?: WXY | WXZ) }xms; ;; my @all = find_all($seq, $subseq); dd \@all; ;; ;; sub find_all { my ($seq, $regex) = @_; ;; local our @hits; use re 'eval'; $seq =~ m{ ($regex) (?{ push @hits, [ $^N, $-[1] ] }) (?!) }xmsg; ;; return @hits; } " [ ["ABCxABCxxxWXYxWXZxxxABCxxWXY", 1], ["ABCxABCxxxWXYxWXZ", 1], ["ABCxABCxxxWXY", 1], ["ABCxxxWXYxWXZxxxABCxxWXY", 5], ["ABCxxxWXYxWXZ", 5], ["ABCxxxWXY", 5], ["ABCxxWXY", 21], ]
(I'm just using  ...ABCxxWXY... to make the permutations and overlaps clear.) (Update: The number that is the second item in each array reference returned is the base-0 offset of the start of the matching subsequence.)

Update: Using your original sequence:

c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA'; ;; my $subseq = qr{ ATG \w* (?: TAG | TAA | TGA) }xms; ;; my @all = find_all($seq, $subseq); dd \@all; ;; ;; sub find_all { my ($seq, $regex) = @_; ;; local our @hits; use re 'eval'; $seq =~ m{ ($regex) (?{ push @hits, [ $^N, $-[1] ] }) (?!) }xmsg; ;; return @hits; } " [ ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA", 1], ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGA", 1], ["ATGGTTTCTCCCATCTCTCCATCGGCATAA", 1], ["ATGATCTAA", 40], ]
This works with Perl 5.8+ regexes. What version of Perl are you using — it might make a difference in future?

Update 2: Remembering that DNA sequences may sometimes be loooong, it may be advantageous to pass the sequence by reference. Note that both the function and the function invocation must change.

c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA'; ;; my $subseq = qr{ ATG \w* (?: TAG | TAA | TGA) }xms; ;; my @all = find_all(\$seq, $subseq); dd \@all; ;; ;; sub find_all { my ($sr_seq, $regex) = @_; ;; local our @hits; use re 'eval'; $$sr_seq =~ m{ ($regex) (?{ push @hits, [ $^N, $-[1] ] }) (?!) }xmsg; ;; return @hits; } " [ ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA", 1], ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGA", 1], ["ATGGTTTCTCCCATCTCTCCATCGGCATAA", 1], ["ATGATCTAA", 40], ]
Still runs under Perl 5.8.


Give a man a fish:  <%-{-{-{-<


In reply to Re^3: Using Recursion to Find DNA Sequences by AnomalousMonk
in thread Using Recursion to Find DNA Sequences by clueless_perl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.