in reply to Verifying a quote matches (closely enough) a source URI

How about parsing the quote into grammatical pieces? You could then compare the parse trees for similarity in addition to looking at what text has changed.

If you can't parse the sentence at all, that's a good hint that it is spam. Otherwise, having a tree handy would make it easy to provide hints and highlights for the human editor to look at, and save their time.

  • Comment on Re: Verifying a quote matches (closely enough) a source URI