knirirr has asked for the wisdom of the Perl Monks concerning the following question:
When code is added to make sure that (ATT)12 is not also detected as (ATT)6 &c. this works quite well, and also provides me with length($1), which is important. But, as it reads through the sequence 6 times it is quite slow. I'd like to replace it with a regexp that reads through the same string only once and finds the same things. I have something like this: while ($sequence_string =~ /(.)\1{$threshold,}|(.{2,3}?)\2{$threshold,}|(.{4,6}?)\3{$threshold,}/g) I'm not really satisfied with this, though. Can anyone suggest a cleaner way to do it? It would be handy if I could iterate from 1 to total genome length and attempt to find all possible combinations above the threshold length at that position. If none are found, move forward to the next charater, but if one is, move forward to the character after the end of the pattern found. Perhaps something like:$threshold = "N" # see description above for($i=1;$i<=6;$i++) { my $pattern = "." x $i; while ($sequence_string =~ /($pattern)\1{$threshold,}/g) { .... } }
Any suggestions would be welcome. Thanks!$i = 1; while ($i < $genome_length) { for ($j=1;$j<=6;$j++) { # run regexp looking for motifs of length $j # starting at position $i } $i = (end of previous match + 1) }
Janitored by Arunbear - replaced pre tags with code tags, as per Monastery guidelines
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regexps for microsatellites
by Roy Johnson (Monsignor) on Nov 08, 2004 at 15:14 UTC | |
by ikegami (Patriarch) on Nov 08, 2004 at 16:18 UTC | |
|
Re: Regexps for microsatellites
by erix (Prior) on Nov 08, 2004 at 14:40 UTC | |
|
Re: Regexps for microsatellites
by bobf (Monsignor) on Nov 08, 2004 at 21:54 UTC | |
|
Re: Regexps for microsatellites
by tachyon (Chancellor) on Nov 09, 2004 at 01:40 UTC | |
|
Re: Regexps for microsatellites
by ikegami (Patriarch) on Nov 08, 2004 at 16:10 UTC | |
by Roy Johnson (Monsignor) on Nov 08, 2004 at 16:16 UTC | |
by ikegami (Patriarch) on Nov 08, 2004 at 16:23 UTC | |
by Roy Johnson (Monsignor) on Nov 08, 2004 at 18:01 UTC | |
by ikegami (Patriarch) on Nov 08, 2004 at 18:48 UTC | |
| |
by knirirr (Scribe) on Nov 09, 2004 at 10:04 UTC |