in reply to Apache solr vs Apache Lucy

As long as you're looking into solutions, also consider Search::ElasticSearch. While it's not recommendable as your primary data store, it seems to be the solution currently en vogue for searching documents.

I've looked at Apache Tika for doing the text extraction from various documents but Tika doesn't find any module specifically talking to Tika.