in reply to PDF File

If want to get the text out of the PDF file, use 'pdftotext' provided by xpdf.

pdftotext works very well. You can pipe the text from the pdf to a file and then parse the text file you created with a perl script.

Replies are listed 'Best First'.
Re: Re: PDF File
by clemburg (Curate) on Jun 21, 2001 at 12:12 UTC

    Yup, this is the way to go. Done it several times, with good success. Multiple column text (like in newspapers or brochures) sucks, though, as you can't tell where the columns start. For this, a little manual work with ghostview might be needed (ghostview can copy and paste text from PDFs after it has extracted the text, e.g., after a search command).

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com