applefiend has asked for the wisdom of the Perl Monks concerning the following question:

Just to predicate this post, I am a fairly inexperienced programmer and totally new to Perl. I am currently writing my first script to become more familiar with this language. I use Windows Vista and have Strawberry Perl on my computer.

Right now I'm working on a script that reads through a directory of Excel files of student records, sorts them by name and version number, and picks out only the most recent version. File names might look something like:

(the format is in last-name_first-name_major_version#)

doe_john_ee_v02.xlsx, doe_john_ee_v05.xlsx, smith_mary_cpe_v02.xlsx

The directory has over 1000+ files in it but filters out everything except the most recent ones(about 250 files). The script cycles through the remaining files, opens them, picks out a date from one cell, and then prints that date, the student name, and student major to an output.

Currently my script works fine but is running extremely slowly. It takes 5-7 minutes to open and close those 250 files which seems excessively long. I'm hoping that someone might be able to provide a little insight into this problem and provide a method to improve performance.

My script utilizes the OLE interface to work with Excel. At the beginning of my code I have:

use OLE; use Win32::OLE; my($xapp) = CreateObject OLE 'Excel.Application' || die $!; $xapp->{'Visible'} = 0;

At this point I sort through the files and pick out the relevant ones. I then cycle through them (@dateSort) to open the files which can be seen from my code below:

for $byDate(@dateSort) { @lines=split/_/,$byDate; # #Read file names in order to retrieve latest revision #date from Excel sheets # $name=join "_", @lines[1,2,3,4]; # # Open the excel file # $fileName = $current_dir.'\\'.$name; if ($output ne '') { print STDOUT "Opening $name"; } $wkb = $xapp->Workbooks->Open($fileName); if ( ! $wkb ) { print STDOUT "\n\nCan't open $name. File name format +incompattible.\n"; exit; } #Open correct sheet from wkbk my $wks = $wkb->Worksheets(1); #Find the latest revision date from the sheet $lastUpdate = $wks->Cells(8,B)->{'Value'}; $wkb->{Saved} = 1; $wkb->Close(); $xapp->Quit();

And thats it. Does anyone know why it's taking so long? Is there anything I can do to speed this process up. Any help would be much appreciated. Thanks, John

Replies are listed 'Best First'.
Re: Speeding up my script
by GrandFather (Saint) on Feb 20, 2011 at 23:47 UTC

    I suspect much of the slowness is just Excel going about the business of opening files. If you are using an old version of Excel, or at least using the old binary file format (2003 or earlier) then Spreadsheet::ParseExcel may help by avoiding OLE and Excel altogether.

    True laziness is hard work
Re: Speeding up my script
by CountZero (Bishop) on Feb 20, 2011 at 23:52 UTC
    Perhaps it is faster to read your Excel files with Spreadsheet::ParseExcel or Spreadsheet::ParseExcel::Simple?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Speeding up my script
by Monkomatic (Sexton) on Feb 21, 2011 at 00:14 UTC

    Oddly printing to stdout takes alot of time on windows xp. It might be even worse on vista. (I wouldn't know im still waiting another year for yet another micro-slow product to be beta tested) You might want to try Tee from cpan. I use it to print to both file and stdout to debug then turn it off and print to logfile only when im finished debugging.

    Alternatively you could also try to save the output into an array and then print out the array when the program is done.

    Although i doubt it will speed it up much. It is probably use Win32::OLE i would try the recommended alternatives.

    use IO::Tee; open my $ofh, '>>', 'LOGFILE.txt' or die "Cannot append to 'LOGFILE.tx +t':$!"; my $tee = IO::Tee->new(\*STDOUT, $ofh); # Prints to both file and stdo +ut #my $tee = IO::Tee->new(\*$ofh);Prints to file only print $tee "Opening $name";