Idris has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, First of, I'm relatively new at perl so if my question sounds stupid or my code doesn't make sense to you, you will understand why. I wrote the following code to parse Microsft Excel files in a certain directory, and generate a summary file. The problem is the code run really very slow. I mean it takes hours on a my powerbook g4 550, with 512 ram. and pretty nothing else is runnnig. I'm not sure why this is the case. I would really appreciate it if someone can point out to me what things I'm doing which might be slowing down the code. Thanks in advance.
#!/usr/bin/perl use strict; $|++; use Spreadsheet::ParseExcel; use Spreadsheet::WriteExcel; my %heading = ( bold => 1, pattern => 1, fg_color => 15, border => 1, align => 'center', ); my @filesToBeProcessed = <INC.*.xls>; my $file; my @arrayOfData; my $totalDayMinutes = 0; my $totalNightMinnites = 0; my $totalGSMMinutes = 0; for (my $x = 0; $x <= $#filesToBeProcessed; $x++) { my $oExcel = Spreadsheet::ParseExcel->new(); my $oBook = $oExcel->Parse($filesToBeProcessed[$x]); my $oWkS = $oBook->{Worksheet}[1]; my @data = getFieldData($oWkS); $data[0] =~ s/\'//gm; $arrayOfData[$x] = [ @data ]; $totalDayMinutes = $totalDayMinutes + $data[1]; $totalNightMinnites = $totalNightMinnites + $data[2]; $totalGSMMinutes = $totalGSMMinutes + $data[3]; } my $workbook = Spreadsheet::WriteExcel->new("summary.xls"); my $worksheet = $workbook->add_worksheet("Summary"); my $heading = $workbook->add_format(%heading); my $format = $workbook->add_format(); my @fields =("Date", "Day Minutes", "Night Minutes", "GSM"); for (my $x = 0; $x < @fields; $x++) { $worksheet->set_column(0, $x, 15); $worksheet->write(0, $x, $fields[$x], $heading); } $format->set_bold(); $format->set_color('black'); $format->set_align('center'); for (my $i = 0; $i <= $#arrayOfData; $i++) { for (my $j = 0; $j < $#{$arrayOfData[$i]}; $j++) { $worksheet->write(($i + 1), $j, $arrayOfData[$i][$j], $form +at); } } $worksheet->write(($#arrayOfData + 2), 1, $totalDayMinutes, $format); $worksheet->write(($#arrayOfData + 2), 2, $totalNightMinnites, $format +); $worksheet->write(($#arrayOfData + 2), 3, $totalGSMMinutes, $format); $workbook->close(); ###################### # Subroutines ###################### sub getFieldData { my ($workSheet) = $_[0]; my @array; for (my $x = 0, my $y = 0; $x <= $workSheet->{MaxRow}; $x++, $y++) { my $cell = $workSheet->{Cells}[$x][1]; $array[$y] = $cell->Value,if($cell); } return @array; }

Title edit by tye

Replies are listed 'Best First'.
Re: Slow Code
by jmcnamara (Monsignor) on Aug 25, 2003 at 10:08 UTC

    You should separate the Spreadsheet::ParseExceland Spreadsheet::WriteExcel code to see which one is causing the problem. There are a couple of things to look out for.

    Spreadsheet::ParseExcel reads entire Excel files into memory and then into Perl data structures. For large files this can be slow. Run Spreadsheet::ParseExcel against filesizes of 0.5MB, 1MB, 5MB and 10MB and you'll see the effect.

    Spreadsheet::WriteExcel may run slowly if it cannot create temporary files. On most OSes this doesn't happen (IIS on Windows is a known exception). See the set_tempdir() section of the documentation for information on how to detect this problem and resolve it.

    See also Out of memory using Spreadsheet::ParseExcel and the Spreadsheet::ParseExcel_XLHTML module which should be quicker than Spreadsheet::ParseExcel although with fewer features.

    --
    John.

      A quick way to use environment variables to get an useful temporary directory, as in Windows systems, is reading the TEMP variable:

      $workbook->set_tempdir($ENV{'TEMP'});

Re: Slow Code
by liz (Monsignor) on Aug 25, 2003 at 09:43 UTC
    I assume you are aware of the Benchmark module? It allows you to check which parts of your code are slow.

    Particularly put a benchmark around each of these lines:

    my $oExcel = Spreadsheet::ParseExcel->new(); my $oBook = $oExcel->Parse($filesToBeProcessed[$x]); my $oWkS = $oBook->{Worksheet}[1]; my @data = getFieldData($oWkS);
    and you'll probably be able to see the "guilty" party.

    Liz

Re: Slow Code
by CountZero (Bishop) on Aug 25, 2003 at 12:56 UTC

    Sometimes a fresh approach is helpful (TMTOWTDI!).

    What would you think of opening these spreadsheets as database-files through DBI and DBD::Excel?

    I'm not sure if it would be faster (internally this DBD-module also uses Spreadsheet::ParseExcel and Spreadsheet::WriteExcel) and of course the worksheet must be in a typical format (each worksheet is considered to be a TABLE, and the first row must contain the field-names).

    Update: I just thought, perhaps DBD::ODBC is faster as it uses a native driver to access your Excel spreadsheets.

    The good thing however is that you can use standard SQL to get your data (very easy and fast to get totals, ... etc), so you don't have to code this in your script..

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Slow Code
by Idris (Initiate) on Aug 25, 2003 at 15:45 UTC
    I'm currently looking at all the idea's listed above and would post my results here. Thanks for all your ideas.