I'm interested in using weighted regular expressions in searching DNA/RNA sequences. So, say I take a set of information on important pieces of DNA. If I look at 12 pieces of DNA I might end up with the following matrix:
A T G C
25 25 25 25 nucleotide 1
10 15 50 25 nucleotide 2
0 90 5 5 nucleotide 3
12 16 32 40 nucleotide 4
where each of the numbers refers to the percentage weight (the percentage of that I can expect to see that nucleotide at that particular position).
So for Nucleotide 1, a simple regex would be atgc as everything is equally weighted.
Nucleotide 3 would be tgc as there is no weight for A.
I would like to write something that would pay attention to the weights at each position, not just the presence of nucleotides.
So nucleotide 3 would be t(90%)g(5%)c(5%) or whatever the correct regex pattern is.
Is this possible? Can anyone give me an example to send me on my way? I have looked in the Friedel book, but I didn't find anything terribly obvious...
Thanks for any help. Go raibh maith agat,
MadraghRua
yet another biologist hacking perl....
In reply to weighting regex patterns by MadraghRua
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |