But to answer your specific question, I use pdftotext to extract the ascii text from a compliant pdf file. Its a bash command line tool which is distributed with the xpdf reader application in many linux distributions. It won't work on scanned images (for which that PDF::OCR sounds particularly interesting; I'll have to check that out, ++ and thanks!). But for folks who export editable documents to PDF, it works like a charm (though is challenged a bit by multi-column content).
-- Hugh
In reply to Re: PDF Text
by hesco
in thread PDF Text
by bmac
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |