arunmep has asked for the wisdom of the Perl Monks concerning the following question:

hello monks i have downloaded XPDF to convert pdf to text i need to convert the pdf text to an array element in perl can anybody explain how to do it

Replies are listed 'Best First'.
Re: pdf to text
by prasadbabu (Prior) on Jan 12, 2006 at 09:06 UTC

    arunmep, if i understood your question correctly, this may help you. Also take a look at CAM::PDF. Your question is not clear, you frame the question such a way, it should have input, required output and the coding you tried, so that it will be easy for monks to answer exactly else you will get only answers in assumption.

    use CAM::PDF; my $self = CAM::PDF::->new('test.pdf'); for (1..$self->{'PageCount'} )#total no of pages { $pages = $self->getPageText($_) ; push (@text, $pages);#push the extracted text into an array }

    Updated: Added ->{'PageCount'}. Thanks svenXY.

    Prasad

      Hi prasadbabu++,
      thanks for mentioning CAM::PDF - great!
      just as a side note: for me, calling ->new() does not return the number of pages, I had to do the following:
      Regards,
      svenXY
Re: pdf to text
by jbrugger (Parson) on Jan 12, 2006 at 09:08 UTC
    man xpdf, then man pdftotext
    You'll find: pdftotext [options] [pdf-file [text-file]],
    you can use `` or qx ,perlfunc:system and perlfunc:exec to execute the command from within perl.
    then open the file and read it.
    open (FILE, "yourfilename") or die "can't open : $!"; close FILE;


    *Update*: Ah, i see prasadbabu understood your question better than i did, i second to his post.

    "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
      i need to extract the text from pdf files to an array element how can i do that using perl
        arunmep,

        A few of your previous nodes ask this question, is there a particular part of the process you are having problems with? Have you used Super Search to find any examples? This has been covered before.

        Cheers,

        Martin