punkish has asked for the wisdom of the Perl Monks concerning the following question:
My project involves carbon-based life forms annotating text files with named entities (people, organizations, places, etc.) to create "ground truth" that can be then fed to someone else's programmatic annotator to make it smarter.
The work done thus far (before I joined the project) was using Callisto, a Java annotator created by Mitre Corp. The results were less than satisfying, and besides, Callisto ain't open source.
I have been looking at Wordfreak, which, besides having a cool name, is open source.
One problem -- both of the above are Java programs, something I don't know beans about. Although this is not exigent, I would like to write a web-based interface for human annotation of text files.... so, the human expert goes to my application, uploads her text file, the program rips through it, presents the text in one frame, a popup widget shows the available entities (customizable, of course), and then, the user can select words, one-by-one, in the text frame, choose the applicable entity-type in the entity frame, and when she is finished, the program generates an xml-ish annotation file. Of course, I would start with Lingua::EN::NamedEntity as the backend.
Ok. So, before I embark on this, any monks aware of this having been done already? Any other thoughts, gotchas, caveats?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: web-and-perl-based Named Entity annotator
by jbert (Priest) on Oct 13, 2006 at 09:32 UTC | |
|
Re: web-and-perl-based Named Entity annotator
by rlucas (Scribe) on Oct 15, 2006 at 16:15 UTC | |
|
Re: web-and-perl-based Named Entity annotator
by planetscape (Chancellor) on Oct 16, 2006 at 08:34 UTC |