in reply to comparing sentences

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^2: comparing sentences
by cntrtrst (Initiate) on Oct 30, 2016 at 05:44 UTC
    Lots of good stuff. Thank you all for your inputs.

    A couple of you asked for samples. There’s nothing unusual about the texts that will be used. I plan to test with texts from wikipedia by introducing misspellings, deletions, additions and changes in punctuation. Of course that continues to beg the question, how much can you alter a sentence before it becomes something else? Maybe I should be asking a different kind of monk about that. :)

    Nevertheless I’ve included some texts below just to give a broad sense of what I expect to see. These are all from: https://en.wikipedia.org/wiki/Human_rights.

    There’s flexibility on the question of how far into the text the algorithm has to be able to make a determination. Probably sentence by sentence as a first approximation.

    Certainly Levenshtein distance looks worthy of study and String::Approx looks very interesting as well, along with a few more suggestions made in the String::Approx description on cpan. I’ll have to experiment will all of this and see where it gets me. And I have to beg your pardon - it could take a while to be able to comment further on these suggestions.

    > And if you want to go hardcore on the problem: “wordnet”
    It’s not so far fetched. At least, some effort to do grammatical parsing or look at sentence structure could be helpful. I’ve had good experiences with Lingua/LinkParser and it can be a way to look at the abstraction of the sentence instead of at the sentence itself, though it's probably too much overhead for this application.