comment on

If you want GPNLP , its a _hard_ problem (as the above posts say). You could leverage the commonality in the output format, realise you are looking for a far smaller sentence subset than a general purpose NLP system. This way you have a hope of practically doing it, of course the method will always be brittle, but as you say you have module Carbon::Life::Mammal::Human to help postprocess.
1) you are only trying to parse _relationships_
2) Each relationship you are looking for is either an ISA or HASA relationship.
3) all final relationships are of the binary form x R y where R is the relationship between x and y
My 'heuristic beard stroking algorithm' woud be
1) partition the whole token set into Entities and Relationships. Do this by pulling out all the proper nouns to start with.
2) find and deconstruct the non trivial compound entities to remove qualifiers and break open sets such as 'Three other cities, x, y and z'
3) Apply simple set math to setermine the membership of each entity foreach relationship.
The biggest challenge you might have is moving from n->1 to n->n relationships. Its easy if everything has just one relationship, but Seoul being both Koreas capital and a city with a >10.2M population is the stumbler imho. Don't forget to account for unary attributes (Seoul is rainy) which don't involve another entity. As you say you have looked at some NLP, go back and read read read and there wil be an answer lurking in here somewhere. Just don't try and generalise the problem too much or it will explode, the best way to practical NLP, is to cheat. :) good luck,
Andy

In reply to Re: The (futile?) quest for an automatic paraphrase engine by andyf
in thread The (futile?) quest for an automatic paraphrase engine by dimar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.