in reply to Regexps for microsatellites

The problem is incompletely defined. What do you want if you encounter "CATCATCATCATCATCATCAT"? Do you want the longest match (7 "CAT"s for a total length 21) or do you want to match using the longuest repeating sequence (3 "CATCAT"s for a total length 18)? You only mentioned the case where the matches had the same length.

Replies are listed 'Best First'.
Re^2: Regexps for microsatellites
by Roy Johnson (Monsignor) on Nov 08, 2004 at 16:16 UTC
    The fact that the OP looks for patterns in order of increasing length suggests that the pattern should not include a repeat.

    Caution: Contents may have been coded under pressure.

      If that's so, then your solution doesn't work. It can easily be fixed by substituting (.{1,6}?) for the existing (.{1,6}).

      Update: Nope, adding '?' is no good, cause it'll think AAAGTCAAAGTC is Ax3 instead of AAAGTCx2.

        AAAGTC is Ax3, assuming a match of three or more counts. Shorter matches get preference.

        Caution: Contents may have been coded under pressure.
Re^2: Regexps for microsatellites
by knirirr (Scribe) on Nov 09, 2004 at 10:04 UTC
    In this case it's definitely the (CAT)7, as if the pattern can be broken down into identical parts then it is treated as a smaller microsatellite. I have some code to do that, e.g.
    my $v1 = substr($match, 0, 1); my $v2 = substr($match, 1, 1); next if ($match =~ /^($v1$v2){2,}$/);