awohld has asked for the wisdom of the Perl Monks concerning the following question:

I'm parsing 6 Excel files totaling 34 MB in a directory and storing them in a DB. Each Excel file has multiple worksheets and each worksheet in every file has the same column format, except column 28 which can be repeated 1 to 5 times at random.

I created a program to parse the files:
#!/usr/bin/perl -w use strict; use Spreadsheet::ParseExcel::Simple; use DBI; # Open directory and get list of all Excel files opendir(DIR , '/www/cgi-bin/data'); my @files = grep { /\.xls$/ } readdir(DIR); closedir(DIR); # File to store all parsed Excel data my $storeFile = '/www/cgi-bin/data/store.dat'; open(STORE,'>',"$storeFile") || die("Cannot Open File"); # DB Column Headings my @header; foreach my $file ( @files ) { my $xls = Spreadsheet::ParseExcel::Simple->read('/www/cgi-bin/data/' + . $file); foreach my $sheet ($xls->sheets) { while ($sheet->has_data) { my @data = $sheet->next_row; # If first column of a row is 'Manu', use this as the head +er if ($data[0] eq 'Manu') { @header = ( @data[0..26], 'compre +ssedColumnName') } # If first column of a row is 'Dell', skip it. if ($data[0] ne 'Dell') { next } # Save first 27 elements of the list. my @info = @data[0..26]; # Delete the first 27 elements of the list. splice @data,0,27; # Join all remaining list elements into a string my $remaining = join(':', @data); # Join them together and delimit list items with a '^' my @out = ( @info, $remaining ); my $out = join('^', @out); # Save it to the temp STORE file. print STORE "$out\n"; } } } close STORE; my $database = "db"; my $db_server = "localhost"; my $user = "user"; my $password = "pass"; # Connect to database my $dbh = DBI->connect("DBI:mysql:$database:$db_server",$user,$passwor +d); $dbh->do("DROP TABLE `table`"); # Format DB Headers, strip out all characters except letters. my @newlist; foreach my $head (@header) { $head =~ s/[^a-zA-Z]//g; push(@newlist, $head); } my $SQL = "CREATE TABLE `table` (". join( " varchar(255),", @newlist ) . " varchar(255))"; $dbh->do($SQL); $SQL = "LOAD DATA LOCAL INFILE '$storeFile' INTO TABLE `table` FIELDS +TERMINATED BY '^' LINES TERMINATED BY '\n' "; $dbh->do($SQL); print "Excel Injected " . scalar localtime() . "\n"; my $excelLog = '/www/cgi-bin/tvw/excel.log'; open(LOG,'>',"$excelLog") || die("Cannot Open File"); print LOG "<font color=\"lime\"><b>Updated " . scalar localtime() . "< +/b></font>\n"; close LOG; exit;

For starters I get this warning, but the program still works:Character in 'C' format wrapped in pack at /usr/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.

When I run this program while it's parsing it uses approx 100% CPU and close to 90%+ RAM on a 2GHz P4 760 MB RAM.

The program goes all the way to the end.

It parses the Excel files, creates the DB Table, inserts all the data into the DB table, and prints the last status line to the "excel.log" logfile. But the file never finishes running, it never exits and I even use exit;. I still see the program running with 77% proc and 50% mem

Why is this program still running?

And is there a way to keep Spreadsheet::ParseExcel::Simple from using so much memory? I didn't see anything about that in the CPAN documentation.

Replies are listed 'Best First'.
Re: Excel Parse Example - Program Hangs
by jmcnamara (Monsignor) on Dec 06, 2005 at 09:45 UTC
      Even though I run high on memory, the program completes.

      If I run out of memory is it possible that the program will complete and get messed up somehow and cause the probelm I'm seeing?

      If the Spreadsheet::ParseExcel::Simple goes out of scope each time in the foreach loop, is the object memory getting destroyed and recreated in each loop?

      Edit: Ahh I see that the memory isn't nescessarially deallocated when the object goes out of scope according to some of the links you suggested.
Re: Excel Parse Example - Program Hangs
by Crackers2 (Parson) on Dec 07, 2005 at 06:59 UTC

    Unrelated to your actual question, but isn't this:

    # If first column of a row is 'Dell', skip it. if ($data[0] ne 'Dell') { next }
    a case of inverted logic between comment and code? i.e. won't this skip a row if the first column is NOT 'Dell'?