in reply to PDF to Text

I don't know the answer to your question but a Super Seach for convert pdf to text reveals quite a few nodes on this topic including (as a quick sample): Can I convert a pdf to html with PDF::Extract??, pdf2txt?, Extract text from PDF and Reading PDF files. A quick skim through those nodes suggests the following modules might help: PDF::Extract and PDF::API2. Searching for pdf on CPAN reveals a few more potential candidates. Good luck and do let us know how you get on :)

--
Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Replies are listed 'Best First'.
Re^2: PDF to Text
by chrism01 (Friar) on Jan 27, 2005 at 01:31 UTC
    I have actually had a look at those modules, but all they do is create/manipulate pdfs. eg PDF::API2 has a fn $string = $pdf->stringify, but this just dumps the file into a string still as pdf format ie you get a load of binary rubbish.
    As for PDF::Extract - "Extracting sub PDF documents from a multi page PDF document"; again output is pdf.
    I just need the bare ascii text that pdftotext gives, except it has the odd random glitch which makes the output corrupted in terms of layout.
    If I can't predict the layout, I can't parse it.