How toread the contents of PDF

arunmep has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How toread the contents of PDF by GrandFather (Saint) on Oct 18, 2006 at 09:15 UTC
A good place to start is to use the search engine available to you right here. For example Super Search finds many nodes that will likely to be of use for you by searching SoPW for titles containing "PDF file". For example: Searching for the content in a PDF file, Search within a PDF file, Create and modify PDF files and many more. Of course a search in CPAN is likely to turn up good stuff too. DWIM is Perl's answer to Gödel	[reply]
Re: How toread the contents of PDF by stvn (Monsignor) on Oct 18, 2006 at 12:06 UTC
I recommend CAM::PDF, you can see an example of text extraction in this script. -stvn	[reply]
Re^2: How toread the contents of PDF by Anonymous Monk on Aug 29, 2013 at 08:46 UTC
the link is not wrking	[reply]
Re^3: How toread the contents of PDF by Corion (Patriarch) on Aug 29, 2013 at 08:49 UTC
The link is not working because it references the fixed CAM::PDF version 1.08. The current version is (at the time of writing) 1.60. It's not hard to find a correct link for that. I just tried to put a "permalink" to the latest version: http://search.cpan.org/dist/CAM-PDF/bin/getpdftext.pl links to the version of the script in the latest distribution.	[reply]
Re: How toread the contents of PDF by blazar (Canon) on Oct 18, 2006 at 09:34 UTC
Hi guys, Iam developing a search engine can anybody tell me is there any way that I can read the contents of PDF file. Is there any package available that can read the entire content of PDf inside the perl script please let me know thank you. Probably, the best thing is to go to CPAN and search for pdf there. If you have difficulties locating one suitable package or are uncertain about the relative merits of some of them in case many are available, then you're welcome to ask for clarifications here.	[reply]
Re^2: How toread the contents of PDF by stvn (Monsignor) on Oct 18, 2006 at 12:16 UTC
Probably, the best thing is to go to CPAN and search for pdf there. Unfortunately, nothing capable of extracting text from a PDF file (at least from what I can see from the POD docs) can be found until the 27th page of the CPAN search. Searching CPAN can sometimes be an enormous waste of time, and asking here can save a lot of time and hassle. UPDATE: I will acknowledge that PDF::API2 can stringify a PDF, but personal experience (something which is not found on search.cpan.org, and easily found here on perlmonks) has been that it does not work as expected all the time. There are also several other misleading modules like PDF::Parse and PDF::Extract. My main point being that sometimes it can save you a whole lot of time to just ask :) -stvn	[reply]
Re^3: How toread the contents of PDF by marto (Cardinal) on Oct 18, 2006 at 17:26 UTC
Did you look at CAM::PDF? Martin	[reply]
Re^4: How toread the contents of PDF by stvn (Monsignor) on Oct 18, 2006 at 19:36 UTC
Re^5: How toread the contents of PDF by marto (Cardinal) on Oct 18, 2006 at 21:16 UTC
Re: How toread the contents of PDF by arunmep (Beadle) on Oct 20, 2006 at 06:19 UTC
HI Monks, After an exhaustive search I reached a workaround and solved my problem. I got a pdf2text.exe from the google desktop software. Extracted the contents from the pdf files to a variable and searched the contents and deleted the file. Thanks for your replies	[reply]