Had they been converted to PDF via Acrobat (or such like) rather than scanned Images I would have suggested looking at
CAM::PDF, however I think you are going to have to OCR each page of each document, since IIRC there won't be any (meaningful) text to parse within the PDF. You may want to start by looking at
PDF::OCR (which IIRC uses
Tesseract) , or some other OCR module from
CPAN.
Check out the
code.google page for
tesseract-ocr
Update: Added link to
tesseract-ocr
Hope this helps
Martin