Kiko has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have an abi inform controled vocabulary PDF file. I want to read all the contents in it and place it in a database so that i can then integrated with another script. Is there a way to do this? Has anyone done anything similar? Please let me know, i would greatly appreciated. Thanks, Kiko

Replies are listed 'Best First'.
Re: PDF File
by footpad (Abbot) on Jun 21, 2001 at 00:27 UTC
Re: PDF File
by Hero Zzyzzx (Curate) on Jun 21, 2001 at 00:17 UTC

    If want to get the text out of the PDF file, use 'pdftotext' provided by xpdf.

    pdftotext works very well. You can pipe the text from the pdf to a file and then parse the text file you created with a perl script.

      Yup, this is the way to go. Done it several times, with good success. Multiple column text (like in newspapers or brochures) sucks, though, as you can't tell where the columns start. For this, a little manual work with ghostview might be needed (ghostview can copy and paste text from PDFs after it has extracted the text, e.g., after a search command).

      Christian Lemburg
      Brainbench MVP for Perl
      http://www.brainbench.com

Re: PDF File
by Chady (Priest) on Jun 21, 2001 at 00:25 UTC

    Check out what you can do with these modules


    He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

    Chady | http://chady.net/