My project involves carbon-based life forms annotating text files with named entities (people, organizations, places, etc.) to create "ground truth" that can be then fed to someone else's programmatic annotator to make it smarter.
The work done thus far (before I joined the project) was using Callisto, a Java annotator created by Mitre Corp. The results were less than satisfying, and besides, Callisto ain't open source.
I have been looking at Wordfreak, which, besides having a cool name, is open source.
One problem -- both of the above are Java programs, something I don't know beans about. Although this is not exigent, I would like to write a web-based interface for human annotation of text files.... so, the human expert goes to my application, uploads her text file, the program rips through it, presents the text in one frame, a popup widget shows the available entities (customizable, of course), and then, the user can select words, one-by-one, in the text frame, choose the applicable entity-type in the entity frame, and when she is finished, the program generates an xml-ish annotation file. Of course, I would start with Lingua::EN::NamedEntity as the backend.
Ok. So, before I embark on this, any monks aware of this having been done already? Any other thoughts, gotchas, caveats?
In reply to web-and-perl-based Named Entity annotator by punkish
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |