comment on

Try something like:

c:\@Work\Perl\monks>perl -wMstrict -le
"use Data::Dump qw(dd);
 ;;
 my $seq = 'xABCxABCxxxWXYxWXZxxxABCxxWXYx';
 ;;
 my $subseq = qr{ ABC \w* (?: WXY | WXZ) }xms;
 ;;
 my @all = find_all($seq, $subseq);
 dd \@all;
 ;;
 ;;
 sub find_all {
   my ($seq, $regex) = @_;
   ;;
   local our @hits;
   use re 'eval';
   $seq =~ m{
     ($regex) (?{ push @hits, [ $^N, $-[1] ] }) (?!)
     }xmsg;
   ;;
   return @hits;
   }
"
[
  ["ABCxABCxxxWXYxWXZxxxABCxxWXY", 1],
  ["ABCxABCxxxWXYxWXZ", 1],
  ["ABCxABCxxxWXY", 1],
  ["ABCxxxWXYxWXZxxxABCxxWXY", 5],
  ["ABCxxxWXYxWXZ", 5],
  ["ABCxxxWXY", 5],
  ["ABCxxWXY", 21],
]
[download]

(I'm just using ...ABCxxWXY... to make the permutations and overlaps clear.) (Update: The number that is the second item in each array reference returned is the base-0 offset of the start of the matching subsequence.)

Update: Using your original sequence:

c:\@Work\Perl\monks>perl -wMstrict -le
"use Data::Dump qw(dd);
 ;;
 my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA';
 ;;
 my $subseq = qr{ ATG \w* (?: TAG | TAA | TGA) }xms;
 ;;
 my @all = find_all($seq, $subseq);
 dd \@all;
 ;;
 ;;
 sub find_all {
   my ($seq, $regex) = @_;
   ;;
   local our @hits;
   use re 'eval';
   $seq =~ m{
     ($regex) (?{ push @hits, [ $^N, $-[1] ] }) (?!)
     }xmsg;
   ;;
   return @hits;
   }
"
[
  ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA", 1],
  ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGA", 1],
  ["ATGGTTTCTCCCATCTCTCCATCGGCATAA", 1],
  ["ATGATCTAA", 40],
]
[download]

This works with Perl 5.8+ regexes. What version of Perl are you using — it might make a difference in future?

Update 2: Remembering that DNA sequences may sometimes be loooong, it may be advantageous to pass the sequence by reference. Note that both the function and the function invocation must change.

c:\@Work\Perl\monks>perl -wMstrict -le
"use Data::Dump qw(dd);
 ;;
 my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA';
 ;;
 my $subseq = qr{ ATG \w* (?: TAG | TAA | TGA) }xms;
 ;;
 my @all = find_all(\$seq, $subseq);
 dd \@all;
 ;;
 ;;
 sub find_all {
   my ($sr_seq, $regex) = @_;
   ;;
   local our @hits;
   use re 'eval';
   $$sr_seq =~ m{
     ($regex) (?{ push @hits, [ $^N, $-[1] ] }) (?!)
     }xmsg;
   ;;
   return @hits;
   }
"
[
  ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA", 1],
  ["ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGA", 1],
  ["ATGGTTTCTCCCATCTCTCCATCGGCATAA", 1],
  ["ATGATCTAA", 40],
]
[download]

Still runs under Perl 5.8.

Give a man a fish: <%-{-{-{-<

In reply to Re^3: Using Recursion to Find DNA Sequences by AnomalousMonk
in thread Using Recursion to Find DNA Sequences by clueless_perl

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.