MachsMit has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, this is my first time posting, and I've tried to format this post as well as I could. Briefly, I'm a beginner programmer, have experience with Scheme and a little Python. I just started learning Perl last week, and so far I've read a fair amount of Perldoc and went through a presentation from an MIT course. I have a directory full of PDFs. I want to open each one, select all, copy, create a new .txt with the same name (preferably in a folder within this directory), and paste the copied PDF into each .txt. Then I want to go through every .txt and extract 4 important pieces of information and paste them into an Excel table. First question: is Perl the best language to do this in? If not, what should I be using? I have looked into AutoIt, but I also don't have admin access to this computer, which is a problem. If this is possible, here is what I have written so far, in an attempt to stitch together what I've learned in the last two days from reading perldoc and what I've found from my extensive google searching.

#!c:/perl/bin/perl -w use strict; use warnings; use Win32::OLE; use File::Find; # Start Excel and make it visible my $xlApp = Win32::OLE->new('Excel.Application'); $xlApp->{Visible} = 1; # Create a new workbook my $xlBook = $xlApp->Workbooks->Add; open FILE, "<Sodeco.txt"; my @report = <FILE>; #print "@report\n"; #print "$report[2]\n"; my $counter = 0; my $column = "A"; my $row = 1; my $dir = "D:/Documents and Settings/m0F61468/Desktop/New Folder"; find(\&textify, $dir); sub textify() { my $file = $_; print "File name is $_\n\t\tFull path is $File::Find::name\n"; } my @fourthings = (); foreach (@report) { if (/^Spare part/) { $fourthings[2] = substr("$report[$counter]",12,-1); } elsif (/^Value of this amendment/) { $fourthings[3] = substr("$report[$counter]",25,-5); } elsif (/^Lurgi GmbH, Lurgiallee 5, D-60439 Frankfurt am Main +/) { $fourthings[1] = substr("$report[$counter+2]",0,-1); } elsif (/^PO reference/) { $fourthings[0] = substr("$report[$counter+1]",0,10); } $counter += 1; } #print "@fourthings\n"; #print "$fourthings[0]\n"; #print "$fourthings[1]\n"; #print "$fourthings[2]\n"; #print "$fourthings[3]\n"; my $four = [$fourthings[0], $fourthings[1], $fourthings[2], $fourthing +s[3]]; print "$four\n"; # Write all the data at once... my $rng = $xlBook->ActiveSheet->Range("A1:D1"); $rng->{Value} = $four;

The things that work in this code: given that I have already accomplished the task of converting all the PDFs to .txts using this imaginary subroutine "txtify", the search loop works very well. I really like the regular expressions in Perl. I know that if I could just figure out how to accomplish the PDF to text conversion, the second part (extracting the info and pasting to Excel) would be cake. I get the feeling that I just don't know nearly enough to do this, though. If the answer to this question is "learn more Perl," my reply is: what exactly should I read more about, and where can I find it? I am not here to have y'all write my program for me. I really just want to know what module I need to learn, or what things I should learn about Perl to accomplish this, and if it is possible. I do want to learn more Perl in the near future, but right now this in particular is frustrating me. Thank you. One more thing I noticed: in order to get the paste into excel to work, I had to use this double bracketed notation and a scalar variable. See: my $four = 4 scalars. For some reason it won't take the array @fourthings and paste it into Excel. Probably because it's an array. That I can work around though, that is a secondary question and only out of curiosity. -Nils

Replies are listed 'Best First'.
Re: Accessing PDF files using Win32 OLE, Windows File Management
by Anonymous Monk on Jul 15, 2011 at 12:29 UTC

    First question: is Perl the best language to do this in?

    Perl is good enough

    If not, what should I be using?

    VBScript, JScript, PowerShell ... but if you don't know either of those, perl will work

    just figure out how to accomplish the PDF to text conversion

    Why can't you do the same thing you're doing with Excel; use OLE on pdf reader, select all, copy, paste ?

    The last time I did similar, I used CAM::PDFs getpdftext.pl with Spreadsheet::WriteExcel

    For some reason it won't take the array @fourthings and paste it into Excel

    Probably a syntax error on your part , as evidenced by your usage of array indices

    my @fourthings = 1..4; my $fourthings = \@fourthings; my $stillFourthings = $fourthings; my $newFourthings = [ @fourthings ];
    See Tutorials: Data Type: Array, references quick reference, perlintro, http://learn.perl.org/books/beginning-perl/

      I tried to figure out how to use OLE on PDF reader, but there's just so much new stuff going on in OLE that I couldn't even get a handle on it. I've looked for a good intro to OLE but can't find one. I couldn't even figure out how to open Adobe Reader using an OLE command... For now I'll take a look at CAM PDF and the other things you suggested. Thanks for the reply.

Re: Accessing PDF files using Win32 OLE, Windows File Management
by Anonymous Monk on Jul 15, 2011 at 14:11 UTC
    I had to extract text from PDF files last year and was successful with CAM::PDF