in reply to PDF Text
But to answer your specific question, I use pdftotext to extract the ascii text from a compliant pdf file. Its a bash command line tool which is distributed with the xpdf reader application in many linux distributions. It won't work on scanned images (for which that PDF::OCR sounds particularly interesting; I'll have to check that out, ++ and thanks!). But for folks who export editable documents to PDF, it works like a charm (though is challenged a bit by multi-column content).
-- Hugh
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: PDF Text
by leocharre (Priest) on Jun 13, 2008 at 13:38 UTC |