| [reply] |
| [reply] |
The link is not working because it references the fixed CAM::PDF version 1.08. The current version is (at the time of writing) 1.60. It's not hard to find a correct link for that.
I just tried to put a "permalink" to the latest version:
http://search.cpan.org/dist/CAM-PDF/bin/getpdftext.pl links to the version of the script in the latest distribution.
| [reply] |
Hi guys, Iam developing a search engine can anybody tell me is there any way that I can read the contents of PDF file. Is there any package available that can read the entire content of PDf inside the perl script please let me know thank you.
Probably, the best thing is to go to CPAN and search for pdf there. If you have difficulties locating one suitable package or are uncertain about the relative merits of some of them in case many are available, then you're welcome to ask for clarifications here.
| [reply] |
Probably, the best thing is to go to CPAN and search for pdf there.
Unfortunately, nothing capable of extracting text from a PDF file (at least from what I can see from the POD docs) can be found until the 27th page of the CPAN search. Searching CPAN can sometimes be an enormous waste of time, and asking here can save a lot of time and hassle.
UPDATE: I will acknowledge that PDF::API2 can stringify a PDF, but personal experience (something which is not found on search.cpan.org, and easily found here on perlmonks) has been that it does not work as expected all the time. There are also several other misleading modules like PDF::Parse and PDF::Extract. My main point being that sometimes it can save you a whole lot of time to just ask :)
| [reply] |
| [reply] |
HI Monks,
After an exhaustive search I reached a workaround and solved my problem. I got a pdf2text.exe from the google desktop software. Extracted the contents from the pdf files to a variable and searched the contents and deleted the file. Thanks for your replies | [reply] |