martinslmn has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have Two Excel workbooks with 4 sheets each. They contain 50000 records for each work book.I use DBI::Excel to fetch details for charts and graphs. But I could not continue, as this script takes almost 2.2 GB of RAM. Am I doing something wrong?

Here is my code.I hope you will understand what I am trying to achieve here.

use strict; use warnings; use DBD::Excel; use DBI; use Modern::Perl; use Data::Dumper; my %clients; my $preMonthDB = DBI->connect("DBI:Excel:file=March_15.xls") or die "Cannot connect: " . $DBI::errstr; my @preSheets = $preMonthDB->func('list_tables'); foreach my $preSheet (@preSheets) { say "$preSheet"; my $gender = $preMonthDB->prepare("SELECT sex,count(*) FROM " . $p +reSheet . " WHERE Scheduled_by <> 'subscriber' GROUP BY sex") or die "Couldn't prepare statement: " . $preMonthDB->e +rrstr; $gender->execute() # Execute the query or die "Couldn't execute statement: " . $gender->errstr; while (my @genders = $gender->fetchrow_array) { $clients{$preSheet}{"previous"}{$genders[0]} = $genders[1]; } my $tests = $preMonthDB->prepare("SELECT type_of_test, count(*) FR +OM " . $preSheet . " WHERE Scheduled_by <> 'subscriber' GROUP BY type +_of_test") or die "Couldn't prepare statement: " . $preMonthDB->e +rrstr; $tests->execute() # Execute the query or die "Couldn't execute statement: " . $tests->errstr; my $testRes = $tests->rows; while (my @test = $tests->fetchrow_array) { $clients{$preSheet}{"previous"}{"Tests"}{$test[0]} = $test[1]; } my $location = $preMonthDB->prepare("SELECT Location_Name, count(* +) FROM " . $preSheet . " WHERE Scheduled_by <> 'subscriber' GROUP BY +Location_Name") or die "Couldn't prepare statement: " . $preMonthDB->e +rrstr; $location->execute() # Execute the query or die "Couldn't execute statement: " . $location->errstr; my $locationRes = $location->rows; say "No Of Test Types are " . $locationRes; while (my @locations = $location->fetchrow_array) { $clients{$preSheet}{"previous"}{"Locations"}{$locations[0]} = +$locations[1]; } my $newPatient = $preMonthDB->prepare("SELECT count(*) FROM " . $p +reSheet . " WHERE Scheduled_by <> 'subscriber' AND New_patient = 'YES +'") or die "Couldn't prepare statement: " . $preMonthDB->e +rrstr; $newPatient->execute() # Execute the query or die "Couldn't execute statement: " . $newPatient->errst +r; my ($newPatientRes) = $newPatient->fetchrow_array; $clients{$preSheet}{"previous"}{"new"} = $newPatientRes; my $totalPatients = $preMonthDB->prepare("SELECT count(*) FROM " . + $preSheet . " WHERE Scheduled_by <> 'subscriber'") or die "Couldn't prepare statement: " . $preMonthDB->e +rrstr; $totalPatients->execute() # Execute the query or die "Couldn't execute statement: " . $totalPatients->er +rstr; my ($totalPatientsRes) = $totalPatients->fetchrow_array; $clients{$preSheet}{"previous"}{"total"} = $totalPatientsRes; $clients{$preSheet}{"previous"}{"existing"} = $totalPatientsRes - +$newPatientRes; } $preMonthDB->disconnect(); my $curMonthDB = DBI->connect("DBI:Excel:file=Appointments_April_15.xl +s") or die "Cannot connect: " . $DBI::errstr; my @curSheets = $curMonthDB->func('list_tables'); foreach my $curSheet (@curSheets) { my $gender = $curMonthDB->prepare("SELECT sex,count(*) FROM " . $c +urSheet . " WHERE Scheduled_by <> 'subscriber' GROUP BY sex") or die "Couldn't prepare statement: " . $curMonthDB->e +rrstr; $gender->execute() # Execute the query or die "Couldn't execute statement: " . $gender->errstr; while (my @genders = $gender->fetchrow_array) { $clients{$curSheet}{"current"}{$genders[0]} = $genders[1]; } my $tests = $curMonthDB->prepare("SELECT type_of_test, count(*) FR +OM " . $curSheet . " WHERE Scheduled_by <> 'subscriber' GROUP BY type +_of_test") or die "Couldn't prepare statement: " . $curMonthDB->e +rrstr; $tests->execute() # Execute the query or die "Couldn't execute statement: " . $tests->errstr; my $testRes = $tests->rows; while (my @test = $tests->fetchrow_array) { $clients{$curSheet}{"current"}{"Tests"}{$test[0]} = $test[1]; } my $location = $curMonthDB->prepare("SELECT Location_Name, count(* +) FROM " . $curSheet . " WHERE Scheduled_by <> 'subscriber' GROUP BY +Location_Name") or die "Couldn't prepare statement: " . $curMonthDB->e +rrstr; $location->execute() # Execute the query or die "Couldn't execute statement: " . $location->errstr; my $locationRes = $location->rows; say "No Of Test Types are " . $locationRes; while (my @locations = $location->fetchrow_array) { $clients{$curSheet}{"current"}{"Locations"}{$locations[0]} = $ +locations[1]; } my $newPatient = $curMonthDB->prepare("SELECT count(*) FROM " . $c +urSheet . " WHERE Scheduled_by <> 'subscriber' AND New_patient = 'YES +'") or die "Couldn't prepare statement: " . $curMonthDB->e +rrstr; $newPatient->execute() # Execute the query or die "Couldn't execute statement: " . $newPatient->errst +r; my ($newPatientRes) = $newPatient->fetchrow_array; $clients{$curSheet}{"current"}{"new"} = $newPatientRes; my $totalPatients = $curMonthDB->prepare("SELECT count(*) FROM " . + $curSheet . " WHERE Scheduled_by <> 'subscriber'") or die "Couldn't prepare statement: " . $curMonthDB->e +rrstr; $totalPatients->execute() # Execute the query or die "Couldn't execute statement: " . $totalPatients->er +rstr; my ($totalPatientsRes) = $totalPatients->fetchrow_array; $clients{$curSheet}{"current"}{"total"} = $totalPatientsRes; $clients{$curSheet}{"current"}{"existing"} = $totalPatientsRes - $ +newPatientRes; } $curMonthDB->disconnect(); print Dumper(%clients);
When I use single sheet with 100 rows, it does not take that much RAM. This means that DBI is not using unused rows. Point me in the right direction. Thanks

Replies are listed 'Best First'.
Re: DBI::Excel Memory usage
by pme (Monsignor) on Jun 18, 2015 at 13:08 UTC
Re: DBI::Excel Memory usage
by afoken (Chancellor) on Jun 18, 2015 at 17:31 UTC

    No help for your memory problem, but a hint for your code: change my $preMonthDB = DBI->connect("DBI:Excel:file=March_15.xls") or die "Cannot connect: " . $DBI::errstr; to my $preMonthDB = DBI->connect("DBI:Excel:file=March_15.xls",'','',{ RaiseError => 1}); and get rid of all of the other or die after DBI method calls. RaiseError enables implicit error checks, even where you forgot the manual error checks.

    Also get used to using placeholders instead of pasting literals into the SQL statement, to avoid SQL injection and to enable reuse of prepared statements.

    On the other hand, 2.2 GBytes is not that much of RAM on any recent machine. pme++ is right, avoiding DBI and DBD::Excel would avoid some overhead. The SQL statements you use here are quite trivial to replace with a few lines of perl, plus you could run all statistics in a single loop instead of running several times over all of the data.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)