in reply to pdf -> text

This is not a pure perl solution, but it might help. You could use pdf2ps that comes with ghostscript to convert to postscript, then use ps2txt to get the text. Perl should help with the "first 400 words" part.

HTH, --traveler

Update: crenz is right. I knew I'd done it simply, but could not find the code. pdftotext is is a good solution.