in reply to Multidimensional regular expressions

It strikes me that something like this has already been encountered in bioinformatics. Try out Algorithms on Strings, Trees and Sequences by Dan Gusfield or Computational Molecular Biology by Pavel Pezner.

So in bioinforamtics, an individual sequence can be considered an array. Comparing a pair of sequences can be like comparing an array with an array. The dimensionality increases with the number of sequences that you want to compare. Typically a two dimesional comparrison is carried out n-1 times on the data set to perform an initial comparison, resulting in a statistical score. You then pop the initial query sequence from the data set and carry through the comparrison with the remaining sequences until you have only one left in the set. The statistical score is used to sort the results in terms of relatedness.

This might then be represented as a tree of sequences with branches and proximity indicating closeness of similarity, or a multiple sequence alignment where the distance of two sequences from each other in the alignment indicates their degree of similarity. You might look into a program called ClustalW for some examples of how this is done.

I hope this adds some fuel to your fire.

yet another biologist hacking perl....

  • Comment on Re: Multidimensional regular expressions