Excelent! I will try it later today on a big corpus of documents to see if I can spot any exceptions. I will come back with feed-back.
multiple occurence:
If the sentence contains two or more consecutive identical words, it doesn't matter which one is marked "new" and which one is marked "moved".
real world problem:
This program is meant to help detect the change in semantic of a corpus of similar documents.
Since word order and new words are the first candidates for a semantic modification I need such a program to detect them and put them in paralel.
In reply to Re^4: diff of two strings
by flaviusm
in thread diff of two strings
by flaviusm
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |