Re: comparing sentences

Replies are listed 'Best First'.
Re^2: comparing sentences by cntrtrst (Initiate) on Oct 30, 2016 at 05:44 UTC
Lots of good stuff. Thank you all for your inputs. A couple of you asked for samples. There’s nothing unusual about the texts that will be used. I plan to test with texts from wikipedia by introducing misspellings, deletions, additions and changes in punctuation. Of course that continues to beg the question, how much can you alter a sentence before it becomes something else? Maybe I should be asking a different kind of monk about that. :) Nevertheless I’ve included some texts below just to give a broad sense of what I expect to see. These are all from: https://en.wikipedia.org/wiki/Human_rights. There’s flexibility on the question of how far into the text the algorithm has to be able to make a determination. Probably sentence by sentence as a first approximation. Certainly Levenshtein distance looks worthy of study and String::Approx looks very interesting as well, along with a few more suggestions made in the String::Approx description on cpan. I’ll have to experiment will all of this and see where it gets me. And I have to beg your pardon - it could take a while to be able to comment further on these suggestions. > And if you want to go hardcore on the problem: “wordnet” It’s not so far fetched. At least, some effort to do grammatical parsing or look at sentence structure could be helpful. I’ve had good experiences with Lingua/LinkParser and it can be a way to look at the abstraction of the sentence instead of at the sentence itself, though it's probably too much overhead for this application. Read more... (2 kB)	[reply]

Replies are listed 'Best First'.

Re^2: comparing sentences
by cntrtrst (Initiate) on Oct 30, 2016 at 05:44 UTC

A couple of you asked for samples. There’s nothing unusual about the texts that will be used. I plan to test with texts from wikipedia by introducing misspellings, deletions, additions and changes in punctuation. Of course that continues to beg the question, how much can you alter a sentence before it becomes something else? Maybe I should be asking a different kind of monk about that. :)

Nevertheless I’ve included some texts below just to give a broad sense of what I expect to see. These are all from: https://en.wikipedia.org/wiki/Human_rights.

There’s flexibility on the question of how far into the text the algorithm has to be able to make a determination. Probably sentence by sentence as a first approximation.

Certainly Levenshtein distance looks worthy of study and String::Approx looks very interesting as well, along with a few more suggestions made in the String::Approx description on cpan. I’ll have to experiment will all of this and see where it gets me. And I have to beg your pardon - it could take a while to be able to comment further on these suggestions.

> And if you want to go hardcore on the problem: “wordnet”
It’s not so far fetched. At least, some effort to do grammatical parsing or look at sentence structure could be helpful. I’ve had good experiences with Lingua/LinkParser and it can be a way to look at the abstraction of the sentence instead of at the sentence itself, though it's probably too much overhead for this application.