nlp monks,

My project involves carbon-based life forms annotating text files with named entities (people, organizations, places, etc.) to create "ground truth" that can be then fed to someone else's programmatic annotator to make it smarter.

The work done thus far (before I joined the project) was using Callisto, a Java annotator created by Mitre Corp. The results were less than satisfying, and besides, Callisto ain't open source.

I have been looking at Wordfreak, which, besides having a cool name, is open source.

One problem -- both of the above are Java programs, something I don't know beans about. Although this is not exigent, I would like to write a web-based interface for human annotation of text files.... so, the human expert goes to my application, uploads her text file, the program rips through it, presents the text in one frame, a popup widget shows the available entities (customizable, of course), and then, the user can select words, one-by-one, in the text frame, choose the applicable entity-type in the entity frame, and when she is finished, the program generates an xml-ish annotation file. Of course, I would start with Lingua::EN::NamedEntity as the backend.

Ok. So, before I embark on this, any monks aware of this having been done already? Any other thoughts, gotchas, caveats?

--

when small people start casting long shadows, it is time to go to bed

In reply to web-and-perl-based Named Entity annotator by punkish

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.