PDF File

Kiko has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: PDF File by footpad (Abbot) on Jun 21, 2001 at 00:27 UTC
Looks like there are several CPAN modules that may help. Consider: PDF (alternate) Text::PDF::API (alternate) among others (alternate) Other random links include: Sanface Software texexec ConvertPS and (as before) many others --f	[reply]
Re: PDF File by Hero Zzyzzx (Curate) on Jun 21, 2001 at 00:17 UTC
If want to get the text out of the PDF file, use 'pdftotext' provided by xpdf. pdftotext works very well. You can pipe the text from the pdf to a file and then parse the text file you created with a perl script.	[reply]
Re: Re: PDF File by clemburg (Curate) on Jun 21, 2001 at 12:12 UTC
Yup, this is the way to go. Done it several times, with good success. Multiple column text (like in newspapers or brochures) sucks, though, as you can't tell where the columns start. For this, a little manual work with ghostview might be needed (ghostview can copy and paste text from PDFs after it has extracted the text, e.g., after a search command). Christian Lemburg Brainbench MVP for Perl http://www.brainbench.com	[reply]
Re: PDF File by Chady (Priest) on Jun 21, 2001 at 00:25 UTC
Check out what you can do with these modules He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life. Chady \| http://chady.net/	[reply]