Inexact string matching is still an active field of research both on the algorithmic and statistical fronts. Some of the more popular algorithms in bioinformatics index the strings and use heuristics to speed up the search. There are also optimizations tied to particular hardware features such as vector units and FPGAs. Most of the programs (eg. BLAST, BLAT, CrossMatch, Exonerate, FASTA, etc.) are written in C with a few exceptions (eg. PatternHunter uses Java).
The concept of an edit distance is useful if you assume that all symbols are equally probable and substitutable, and there are no context dependencies. Biological sequences are not strings, however, and if one is looking for something biological, it often behooves one to use a stochastic grammar rather than a regular one. This is why hidden Markov models have become so popular. In protein space, HMMER is a popular software package (also written in C). Not many people use HMMs in nucleotide space, but they should.
These books might help
"Biological Sequence Analysis: A Probabilistic Approach": Durbin, Eddy, Krogh, and Mitchison, Cambridge University Press
"BLAST": Korf, Yandell, and Bedell, O'Reilly & Associates
"Algorithms on Strings, Trees, and Sequences": Gusfield, Cambridge University Press
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.