in reply to Multidimensional regular expressions
So in bioinforamtics, an individual sequence can be considered an array. Comparing a pair of sequences can be like comparing an array with an array. The dimensionality increases with the number of sequences that you want to compare. Typically a two dimesional comparrison is carried out n-1 times on the data set to perform an initial comparison, resulting in a statistical score. You then pop the initial query sequence from the data set and carry through the comparrison with the remaining sequences until you have only one left in the set. The statistical score is used to sort the results in terms of relatedness.
This might then be represented as a tree of sequences with branches and proximity indicating closeness of similarity, or a multiple sequence alignment where the distance of two sequences from each other in the alignment indicates their degree of similarity. You might look into a program called ClustalW for some examples of how this is done.
I hope this adds some fuel to your fire.
MadraghRua
yet another biologist hacking perl....
|
---|