in reply to Re: parse pdf
in thread parse pdf
Another option is to use a converter to extract the text from the PDF.
On Ubuntu, the program you want is pdftotext in the package poppler-utils, installed with:
sudo apt-get install poppler-utils
pdftotext has several options which affect the formatting of the text output, so you should experiment with its options to see if you can improve on the text version you already have.
I recently used pdftotext to successfully extract the text from a PDF with several hundred pages. YMMV
It may be worth looking to see if there are other programs capable of extracting text.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: parse pdf
by ag4ve (Monk) on Nov 06, 2010 at 01:51 UTC |