in reply to Re: extract text from pdf
in thread extract text from pdf
I got a pdf I've created with openoffice and pdftotext is able to extract text from it, whereas CAM::PDF (or File::Extract::PDF) gives me messy characters.
And pdftotext:[jerome@saab pdf]$ getpdftext.pl -v ~/faxTaxHabitation2005.pdf ! " # $ % # & ' ( " ) * + + + ...
[jerome@saab pdf]$ pdftotext ~/faxTaxHabitation2005.pdf txt [jerome@saab pdf]$ tail txt Merci de bien vouloir me confirmer ces informations par retour de fax +afin que je puisse proceder au paiment le plus rapidement possible au + numero suivant : ************* Cordiales salutations. ...
The ideal would be a perl module linked to the xpdf C code .. :)
-- Nice photos of naked perl sources here !
|
|---|