in reply to Re: Tips on how to perform this regex query
in thread Tips on how to perform this regex query

It depends on whether you count deletion and insertion as possible operations, too. But if you do not, xor is a good trick.
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
  • Comment on Re^2: Tips on how to perform this regex query

Replies are listed 'Best First'.
Re^3: Tips on how to perform this regex query
by BrowserUk (Patriarch) on Jan 11, 2014 at 11:31 UTC
    It depends on whether you count deletion and insertion as possible operations, too.

    Have you found an application where 'edit distance' is actually meaningful?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Well, besides DNA matching, maybe typo correcting?
      Update:I have also read some papers on correlation between edit distance and understanding people with speech impediments.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        besides DNA matching

        Sorry, but do you have any evidence or example of actual DNA problems being solved or assisted by running edit distance on subsequences?

        Whilst computational biology is definitely not my field, I've seen no useful use of edit distance for genomic work-- beyond using it as a very crude first pass selection mechanism; but even for that it is almost useless as it selects on entirely the wrong criteria.

        About as much use as comparing paintings, by weighing them.

        As always, I'm more than happy to be proved wrong on this. (But neither opinion nor hearsay is proof!)

        maybe typo correcting?

        Run on pairs of individual words, it can be sometimes be vaguely useful, but given that 'lead' and 'gold' have a edit distance of 3 (or 75%) -- and then so do all these 2,382 other 4 character words:

        -- so what does that mean? IMO it is an almost, if not entirely, useless metric.

        And if you use it (any edit distance algorithm) on pairs of strings containing several words, the result *is* completely meaningless in any word-formation, language-structure, or typographical sense.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.