in reply to Re^4: Regexps for microsatellites
in thread Regexps for microsatellites
So two questions for knirirr:
1) Should CATCATCATCATCAT give (a) CATx5 or (b) CATCATx2?
2) Should AAAGTCAAAGTCAAAGTC give (a) AAAx3 or (b) AAAGTCx3?
This can be rephrased as:
Should we favour longuest match (1a and 2b),
should we favour longuest sequence (1b and 2b), or
should we favour shortest sequence (1a and 2a)?
Until then, we have:
# Favour shortest sequence. my $thresh = 1; # Match *more than* $thresh times. while (<DATA>) { chomp; while (/((.{1,6}?)\2{$thresh,})/g) { printf( "Found %d %-6s (length=%d, total=%2d) at pos %2d in %s\n", length($1) / length($2), # Number of matches. $2, # Sequence. length($2), # Length of sequence. length($1), # Length of match. $-[0], # Start position. $_ # String we're searching. ); } } __DATA__ CATCATCATCATCAT AAAGTCAAAGTCAAAGTC gives: Found 5 CAT (length=3, total=15) at pos 0 in CATCATCATCATCAT Found 3 A (length=1, total= 3) at pos 0 in AAAGTCAAAGTCAAAGTC Found 2 GTCAAA (length=6, total=12) at pos 3 in AAAGTCAAAGTCAAAGTC
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Regexps for microsatellites
by knirirr (Scribe) on Nov 09, 2004 at 10:28 UTC |