Hello,
I'm just writing a script that will perform a number of different, very simple regular expressions on a large (up to 300 million characters) string. The searches are really simple, things like:
while ( $sequence =~ /CACGTG/g ) { print join("\t", $chr, '+', pos($sequence)-6), "\n"; } while ( $sequence =~ /GTGCAC/g ) { print join("\t", $chr, '-', pos($sequence)-6), "\n"; }
I have two questions about this. First, am I right to be doing these searches sequentially? Or ought I be trying to combine them all into one single regular expression? I chose the sequential approach because it seemed far more readable and easily scalable, but I didn't know if there was any reason to believe it was horribly inefficient or silly.
Second, is my use of pos($sequence) - match_length to find the starting point of the match appropriate, or is there a simpler way of getting the match start?
Many thanks in advance for any thoughts/advice/comments.
In reply to Multiple Regex's on a Big Sequence by bernanke01
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |