in reply to I think I like Perl too much...

I really don't think an index-based solution is going to be easier to code than a regex-based solution; and anyway, the number of patterns is not that huge. It's "N choose R", where R is 3 (in your example) and N is the length of the motif, not including the anchor ("aug"). For your example, that's only 35 patterns. Increase the length of the motif by 1, and it's 56. Then 84, then 120. Not explosive.
sub N_choose_R { my( $l, $n, ) = @_; $n > $l and die "$n > $l !!!\n"; $n == 1 and return( reverse map { [$_] } 0 .. $l-1 ); $n == $l and return [ reverse( 0 .. $l-1 ) ]; my @lists; for my $s ( reverse( $n-1 .. $l-1 ) ) { push @lists, map { [ $s, @$_ ] } N_choose_R( $s, $n-1 ); } @lists } my $motif = 'gccrccaugg'; my $anchor = 'aug'; my $max_mismatches = 3; my( $pre, $post ) = split $anchor, $motif, 2; my $anchor_ofs = length $pre; my @patterns = map { my $p = $motif; for my $i ( @$_ ) { substr( $p, $i < $anchor_ofs ? $i : $i + length($anchor), 1 ) += '.'; } $p =~ s/r/[ag]/g; $p } N_choose_R( length( $pre.$post ), $max_mismatches ); my $re = join '|', @patterns;

Replies are listed 'Best First'.
Re^2: I think I like Perl too much...
by tlm (Prior) on Mar 31, 2005 at 16:28 UTC

    Agreed, the difference in terms of complexity of the code is not big, but the difference in speed is. At least my implementations of both solutions differed by one order of magnitude. Besides, this is all kids' play compared to the algorithms discussed in earlier threads like this one and this one (thanks to dmerphq for the heads-up).

    the lowliest monk