Funny, that's exactly what I'm working on right now.
I'm using ocr too though. Making it so you can search terms, and it will return page number and where the document is.
I'll have a release in a few weeks (open source).
It's interface independent. But I'll include a cli and a CGI::Application front end.
If you think you know what you're doing- I wouldn't mind sharing my outline with you- I have some planning into it already.
The main point that may be very different for you- is that the archive I am dealing with will have at least 20k pdf files- and they are changing at all times. I expect my data to always be at most an hour old.. or so.. Once the system is fed and working.
In reply to Re: PDF Indexing / Search
by leocharre
in thread PDF Indexing / Search
by Trihedralguy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |