in reply to Full Text Search

I've done some thinking on this. This is how I'm going to approach this, with a focus around ease of implementation. Here's one way to approach it:

  1. Use pdftotext, which I believe is provided by xpdf to get the text out of each pdf (this works for compressed (or "optimized" in adobe speak) pdfs too). Put this text into a mySQL database. (or dump it out to the filesystem and index with htdig.)
  2. Then either use mySQLs fulltext index feature, (which is limited in it's features, but still damn fast and pretty good, too) to create your search index or create your own mySQL search query.

Replies are listed 'Best First'.
Re: Re: Full Text Search
by Hero Zzyzzx (Curate) on May 14, 2001 at 17:46 UTC

    In my research around this, I remember there being a search engine project on SourceForge that had code to dump text from pdfs using pdftotext. Try searching there.