in reply to Re^3: Tips on how to perform this regex query
in thread Tips on how to perform this regex query

Well, besides DNA matching, maybe typo correcting?
Update:I have also read some papers on correlation between edit distance and understanding people with speech impediments.
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
  • Comment on Re^4: Tips on how to perform this regex query

Replies are listed 'Best First'.
Re^5: Tips on how to perform this regex query
by BrowserUk (Patriarch) on Jan 11, 2014 at 12:56 UTC
    besides DNA matching

    Sorry, but do you have any evidence or example of actual DNA problems being solved or assisted by running edit distance on subsequences?

    Whilst computational biology is definitely not my field, I've seen no useful use of edit distance for genomic work-- beyond using it as a very crude first pass selection mechanism; but even for that it is almost useless as it selects on entirely the wrong criteria.

    About as much use as comparing paintings, by weighing them.

    As always, I'm more than happy to be proved wrong on this. (But neither opinion nor hearsay is proof!)

    maybe typo correcting?

    Run on pairs of individual words, it can be sometimes be vaguely useful, but given that 'lead' and 'gold' have a edit distance of 3 (or 75%) -- and then so do all these 2,382 other 4 character words:

    -- so what does that mean? IMO it is an almost, if not entirely, useless metric.

    And if you use it (any edit distance algorithm) on pairs of strings containing several words, the result *is* completely meaningless in any word-formation, language-structure, or typographical sense.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Edit distance as a measure of loanword adaptation. See the cited papers too, that's how I got to the one about speech impediments.

      Bioinformatics uses a bit more complex edit distance measures than the traditional one.

      Typos: an article about finding likely typos with edit distance.

      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Edit distance as a measure of loanword adaptation.

        That links to a corrupted (or non-)pdf.

        Bioinformatics uses a bit more complex edit distance measures than the traditional one.

        That's really understating the difference. From the link you provided:

        In a constant gap penalty, every gap receives some predetermined constant penalty, regardless of its length. Thus, the insertion or deletion of 1000 contiguous symbols is penalized equally to that of a single symbol.

        That "constant gap penalty" completely changes the dynamics of the algorithm. And thus makes the modfied algorithm useful for finding alignments of subsequences; which the standard edit distance it completely useless for.

        Typos: an article about finding likely typos with edit distance.

        The Damerau–Levenshtein algorithm is quite different -- allows/measures transpositions -- to the Levenshtein algorithm -- which does not.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        pExamine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.