MachsMit has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks, this is my first time posting, and I've tried to format this post as well as I could. Briefly, I'm a beginner programmer, have experience with Scheme and a little Python. I just started learning Perl last week, and so far I've read a fair amount of Perldoc and went through a presentation from an MIT course. I have a directory full of PDFs. I want to open each one, select all, copy, create a new .txt with the same name (preferably in a folder within this directory), and paste the copied PDF into each .txt. Then I want to go through every .txt and extract 4 important pieces of information and paste them into an Excel table. First question: is Perl the best language to do this in? If not, what should I be using? I have looked into AutoIt, but I also don't have admin access to this computer, which is a problem. If this is possible, here is what I have written so far, in an attempt to stitch together what I've learned in the last two days from reading perldoc and what I've found from my extensive google searching.
#!c:/perl/bin/perl -w use strict; use warnings; use Win32::OLE; use File::Find; # Start Excel and make it visible my $xlApp = Win32::OLE->new('Excel.Application'); $xlApp->{Visible} = 1; # Create a new workbook my $xlBook = $xlApp->Workbooks->Add; open FILE, "<Sodeco.txt"; my @report = <FILE>; #print "@report\n"; #print "$report[2]\n"; my $counter = 0; my $column = "A"; my $row = 1; my $dir = "D:/Documents and Settings/m0F61468/Desktop/New Folder"; find(\&textify, $dir); sub textify() { my $file = $_; print "File name is $_\n\t\tFull path is $File::Find::name\n"; } my @fourthings = (); foreach (@report) { if (/^Spare part/) { $fourthings[2] = substr("$report[$counter]",12,-1); } elsif (/^Value of this amendment/) { $fourthings[3] = substr("$report[$counter]",25,-5); } elsif (/^Lurgi GmbH, Lurgiallee 5, D-60439 Frankfurt am Main +/) { $fourthings[1] = substr("$report[$counter+2]",0,-1); } elsif (/^PO reference/) { $fourthings[0] = substr("$report[$counter+1]",0,10); } $counter += 1; } #print "@fourthings\n"; #print "$fourthings[0]\n"; #print "$fourthings[1]\n"; #print "$fourthings[2]\n"; #print "$fourthings[3]\n"; my $four = [$fourthings[0], $fourthings[1], $fourthings[2], $fourthing +s[3]]; print "$four\n"; # Write all the data at once... my $rng = $xlBook->ActiveSheet->Range("A1:D1"); $rng->{Value} = $four;
The things that work in this code: given that I have already accomplished the task of converting all the PDFs to .txts using this imaginary subroutine "txtify", the search loop works very well. I really like the regular expressions in Perl. I know that if I could just figure out how to accomplish the PDF to text conversion, the second part (extracting the info and pasting to Excel) would be cake. I get the feeling that I just don't know nearly enough to do this, though. If the answer to this question is "learn more Perl," my reply is: what exactly should I read more about, and where can I find it? I am not here to have y'all write my program for me. I really just want to know what module I need to learn, or what things I should learn about Perl to accomplish this, and if it is possible. I do want to learn more Perl in the near future, but right now this in particular is frustrating me. Thank you. One more thing I noticed: in order to get the paste into excel to work, I had to use this double bracketed notation and a scalar variable. See: my $four = 4 scalars. For some reason it won't take the array @fourthings and paste it into Excel. Probably because it's an array. That I can work around though, that is a secondary question and only out of curiosity. -Nils
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Accessing PDF files using Win32 OLE, Windows File Management
by Anonymous Monk on Jul 15, 2011 at 12:29 UTC | |
by MachsMit (Initiate) on Jul 15, 2011 at 13:21 UTC | |
by Anonymous Monk on Jul 15, 2011 at 14:02 UTC | |
|
Re: Accessing PDF files using Win32 OLE, Windows File Management
by Anonymous Monk on Jul 15, 2011 at 14:11 UTC |