I used pdftotext (part of xpdf, http://www.foolabs.com/xpdf/) for a client's search engine. Yes, you have to spawn a process, but pdftotext is rather fast and works nicely. Since the search engine is reindexing the site twice daily, I cache pdftotext's output in a text file, whose timestamp I compare to the PDF file, so most of the time I only have to slurp in the cached text file.
In reply to Re: pdf -> text
by crenz
in thread pdf -> text
by heezy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |