Re: parse content of PDF file


Think about Loose Coupling
	PerlMonks

Re: parse content of PDF file

by archfool (Monk)

on Aug 03, 2007 at 13:50 UTC ( [id://630510]=note: print w/replies, xml )

Need Help??

in reply to parse content of PDF file

If there were any reasonable way to do it, the software would cost a lot. Your key here was _scanned_. This means Optical Character Recognition (OCR), a very imperfect science at the moment. You will need OCR software, and there's very little free OCR software out there, let alone any Perl bindings to it.

You'll need to convert the PDF to text with some OCR software FIRST. THEN running perl against it will be easy.