in reply to Re: memory usage Spreadsheet::ParseExcel
in thread memory usage Spreadsheet::ParseExcel

I looked at jmcnamara but no help! NOTE: I am hashing a sheet at a time because no where have I found it stated that absolutely the rows and columns in the xls and passed through the cell_handler are in sequential order. NOTE: When executing the below where @workbook_zip_FNS is a list of xls files I get out of memory. But when I run it on each xls file individually (in a batch file passing each xls into the script replacing the for (my $i=0;$i < scalar(@workbook_zip_FNS) ;$i++ ) iterator) then each run goes to completion without error.
my $cell_outputfh; my $xls_parser = Spreadsheet::ParseExcel->new( CellHandler => \&HH_cell_handler, NotSetCell => 1 ); for (my $i=0;$i < scalar(@workbook_zip_FNS) ;$i++ ) { open ( $cell_outputfh, ">:encoding(iso-8859-1)", $ribbon_srcgeo[$i +].".csv" ) or die " open failed on $ribbon_srcgeo[$i].csv $!"; my $workbook = $xls_parser->parse($unzipped_xls_FNS[$i]); if ( !defined $workbook ) { die $xls_parser->error(), ".\n"; } print "\n"; close $cell_outputfh; for my $worksheet ( $workbook->worksheets() ) { my $wsname=$worksheet->get_name(); print $wsname."*"; open ( $cell_outputfh, "<:encoding(iso-8859-1)", $ribbon_srcge +o[$i].".csv" ) or die " open failed on $ribbon_srcgeo[$i].csv $!"; my $HofA_sheet_rows; while (<$cell_outputfh>) { my @fields = split /","/, $_; $fields[0] =~ s/^"//; $fields[3] =~ s/"$//; if( $fields[0] eq $wsname ) { $HofA_sheet_rows{$wsname}[$fields[1]][$fields[2]] = $f +ields[3]; } } undef $HofA_sheet_rows; close $cell_outputfh; } undef $workbook; } sub HH_cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; my $worksheet = $workbook->worksheet($sheet_index); my $wsname=$worksheet->get_name(); my $value='"'.$wsname.'"'.','.'"'.$row.'"'.','.'"'.$col.'"'.','.'" +'.$cell->value().'"'."\n"; print $cell_outputfh $value; }

Replies are listed 'Best First'.
Re^3: memory usage Spreadsheet::ParseExcel
by ww (Archbishop) on May 13, 2012 at 15:03 UTC
    The reference was to the Monk who is the module's author current maintainer (extender, enhancer and general expert) and who frequently responds directly to questions on use of his contributions.

    You might have known that had you:

    1. Read the module's documentation (perldoc Spreadsheet::ParseExcel) -- something you can (almost certainly) do at your own terminal.
       
    2. Construed the link as something other than a direct answer and search CPAN for the id posted on his personal page here.

    Update: Correcting the title I attributed to jmcnamara

      Did not misconstru anything, I searched through all 966 posts but found no help except did confirm that "parses the entire spreadsheet document into memory."

      No where in the 966 posts was it said that rows and columns are in sequence when passed through the cell_handler

      I experimented and confirmed, perl does not release memory to a free pool on undef of the Spreadsheet::ParseExcel object reference. So, the memory used during the parse process is sitting there unused and un-allocatable to other variables.

      my $cell_outputfh; for (my $i=0;$i < scalar(@workbook_zip_FNS) ;$i++ ) { my $xls_parser = Spreadsheet::ParseExcel->new( CellHandler => \&HH_cell_handler, NotSetCell => 1 ); open ( $cell_outputfh, ">:encoding(iso-8859-1)", $ribbon_srcgeo[$i +].".csv" ) or die " open failed on $ribbon_srcgeo[$i].csv $!"; my $workbook = $xls_parser->parse($unzipped_xls_FNS[$i]); if ( !defined $workbook ) { die $xls_parser->error(), ".\n"; } print "\n"; close $cell_outputfh; my @wsnames; my @row_mins; my @row_maxs; my @col_mins; my @col_maxs; for my $worksheet ( $workbook->worksheets() ) { push @wsnames, $worksheet->get_name(); my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); push @row_mins, $row_min; push @row_maxs, $row_max; push @col_mins, $col_min; push @col_maxs, $col_max; } undef $workbook; undef $xls_parser; for (my $j=0;$j < scalar(@wsnames) ;$j++ ) { print $wsnames[$j]."*"; open ( $cell_outputfh, "<:encoding(iso-8859-1)", $ribbon_srcge +o[$i].".csv" ) or die " open failed on $ribbon_srcgeo[$i].csv $!"; my %HofA_sheet_cells; # make has out of cells while (<$cell_outputfh>) { my @fields = split /","/, $_; $fields[0] =~ s/^"//; $fields[3] =~ s/"$//; if( $fields[0] eq $wsnames[$j] ) { $HofA_sheet_cells{$wsnames[$j]}[$fields[1]][$fields[2] +] = $fields[3]; $dot_ctr++; if ( $dot_ctr == 1000) { $| = 1; #flush the buffer print "*"; $dot_ctr=0; } } } undef %HofA_sheet_cells; close $cell_outputfh; } } sub HH_cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; my $worksheet = $workbook->worksheet($sheet_index); my $wsname=$worksheet->get_name(); my $value='"'.$wsname.'"'.','.'"'.$row.'"'.','.'"'.$col.'"'.','.'" +'.$cell->value().'"'."\n"; print $cell_outputfh $value; }
Re^3: memory usage Spreadsheet::ParseExcel
by Anonymous Monk on May 13, 2012 at 14:00 UTC
    I have been experimenting with a number of different ways to solve this out of memory problem. The code snippet above has a bug the my $HofA_sheet_rows; should be my %HofA_sheet_rows; and undef $HofA_sheet_rows; should be undef %HofA_sheet_rows; I am rerunning to see if problem raise its head again. I will post my results when run is complete.