So I'm a week into writing a $#@%! load of Perl that identifies people's names on web sites when I realize that I should query you guys to see if someone's done this already and is willing to share code. I know people have done this already for applications like identifying people's names in news articles, but my internet search led down paths that always ended with proprietary and expensive software.
I'm building a search site for art images. The backend spiders millions of art images on thousands of gallery web sites and then tries to identify which snippet of text on the web page is an artist name. I'm structuring the data associated with each image, not indexing, so I have to read in a firstname, MI, Lastname by separating these three fields from all the other text. So far my code works on a few sites (without templates), but it needs at least another week of testing and tuning to make it work on a wide variety of different sites, as the matching technique is fuzzy, and it has to train on a wide variety of cases.
Has anyone heard of open source software that can shorten the drudgery? Even if the software was built for another application like news articles, it might be faster to tweak it to my application