Re: memory usage Spreadsheet::ParseExcel
by ww (Archbishop) on May 13, 2012 at 12:24 UTC
|
Actually, it's inherent that Perl releases the memory once it's no longer in use (ref count = 0)...
But, it doesn't release the memory (deallocate it) back to the OS; it holds it for reuse.
So, code and data (small samples, please) to illustrate your problem would be helpful. (Of course, jmcnamara may have an answer without that....) | [reply] |
|
|
I looked at jmcnamara but no help!
NOTE: I am hashing a sheet at a time because no where have I found it stated that absolutely the rows and columns in the xls and passed through the cell_handler are in sequential order.
NOTE: When executing the below where @workbook_zip_FNS is a list of xls files I get out of memory. But when I run it on each xls file individually (in a batch file passing each xls into the script replacing the for (my $i=0;$i < scalar(@workbook_zip_FNS) ;$i++ ) iterator) then each run goes to completion without error.
my $cell_outputfh;
my $xls_parser = Spreadsheet::ParseExcel->new(
CellHandler => \&HH_cell_handler,
NotSetCell => 1
);
for (my $i=0;$i < scalar(@workbook_zip_FNS) ;$i++ ) {
open ( $cell_outputfh, ">:encoding(iso-8859-1)", $ribbon_srcgeo[$i
+].".csv" ) or die " open failed on $ribbon_srcgeo[$i].csv $!";
my $workbook = $xls_parser->parse($unzipped_xls_FNS[$i]);
if ( !defined $workbook ) { die $xls_parser->error(), ".\n"; }
print "\n";
close $cell_outputfh;
for my $worksheet ( $workbook->worksheets() ) {
my $wsname=$worksheet->get_name();
print $wsname."*";
open ( $cell_outputfh, "<:encoding(iso-8859-1)", $ribbon_srcge
+o[$i].".csv" ) or die " open failed on $ribbon_srcgeo[$i].csv $!";
my $HofA_sheet_rows;
while (<$cell_outputfh>) {
my @fields = split /","/, $_;
$fields[0] =~ s/^"//;
$fields[3] =~ s/"$//;
if( $fields[0] eq $wsname ) {
$HofA_sheet_rows{$wsname}[$fields[1]][$fields[2]] = $f
+ields[3];
}
}
undef $HofA_sheet_rows;
close $cell_outputfh;
}
undef $workbook;
}
sub HH_cell_handler {
my $workbook = $_[0];
my $sheet_index = $_[1];
my $row = $_[2];
my $col = $_[3];
my $cell = $_[4];
my $worksheet = $workbook->worksheet($sheet_index);
my $wsname=$worksheet->get_name();
my $value='"'.$wsname.'"'.','.'"'.$row.'"'.','.'"'.$col.'"'.','.'"
+'.$cell->value().'"'."\n";
print $cell_outputfh $value;
}
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
|
|
I have been experimenting with a number of different ways to solve this out of memory problem.
The code snippet above has a bug the my $HofA_sheet_rows; should be my %HofA_sheet_rows; and undef $HofA_sheet_rows; should be undef %HofA_sheet_rows;
I am rerunning to see if problem raise its head again. I will post my results when run is complete.
| [reply] |
Re: memory usage Spreadsheet::ParseExcel
by runrig (Abbot) on May 13, 2012 at 21:43 UTC
|
It's a known issue that S::PE objects have cyclic references, so objects never get completely freed. The simplest solution is to create each object in a separate process (i.e. fork, create object, process, end child process). The other issue of parsing the entire document to memory as you've found is solved by using the CellHandler and NotSetCell attributes (or with Spreadsheet::ParseExcel::Stream), but it is not guaranteed that the cells are parsed "in order." They seem to be parsed in order when the spreadsheet is created by Excel, but not, e.g., when the spreadsheet is created by Spreadsheet::WriteExcel without using 'compatibility_mode' and the rows/cells are written out of order.
| [reply] |
|
|
Thanks for the input. Spreadsheet::ParseExcel is a good module and gets the job done. Thanks for that, too.
For those that follow, I experimented writing to a DBM:: Deep but is slow while DBI->connect is faster when AutoCommit => 0 but that uses memory, too.
The strategy that I followed was to write files of sheet cells in the cell_handler and then read these into a hash to get the cells row and column sequential. A 131MB xls parse and the sheet hash would fit into 4GB of memory, barely. Now if the xls is much bigger or the sheets are much bigger, I will have to fork the parsing to a separate child process in order to free up the memory used by the parser. A 131MB xls consisting of one sheet ain't going to fit in 4GB in one process.
Now the simplest solution is to increase the computers available core and fight the out of memory battle with $27. But alas, the liberals and the marxists in the Whitehouse have gotten this economy so screwed up with dreams of green energy that I cannot afford buying a new computer (mine is at its 4GB limit) and pay for food,shelter,clothing, and taxes. I will of course proceed to proceed!
| [reply] |
|
|
Did you try parsing using the CellHandler and NotSetCell attributes, and find that they were parsed out of order? You don't say whether you tried or not, you just seem to fear that they might be parsed out of order. You should try it before you reject the notion. As I said earlier, they will likely be parsed in order unless the spreadsheet was created by something like Spreadsheet::WriteExcel and the cells were written out of order.
| [reply] |
Re: memory usage Spreadsheet::ParseExcel
by BrowserUk (Patriarch) on May 14, 2012 at 12:51 UTC
|
Do the Monks have any idea how to free the memory used ...
If, as appears to be the case from the follow-on discussion, the problem is circular references preventing the data structures being destroyed, maybe running a depth-first function to free it might resolve the problem.
See how you get on with this:
sub recursiveFree {
my $ref = shift;
if( ref $ref eq 'ARRAY' ) {
recursiveFree( $ref->[ $_ ] ) for 0 .. $#{ $ref };
}
elsif( ref $ref eq 'HASH' ) {
recursiveFree( $ref->{ $_ } ) for keys %{ $ref };
}
elsif( ref $ref and ref $ref ne 'SCALAR' ) {
warn "Unhandled reftype: ", ref( $ref );
}
undef $ref;
return;
}
recursiveFree( $xls );
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] |
|
|
Tried it, no help, but great thanks anyway
"the referenced object has been blessed into a package, then that package name is returned instead. I added Spreadsheet::ParseExcel::
::Workbook
::FmtDefault
::Format
::Font
::Worksheet
as refs for Recursion. None seem to free memory and ::Worksheet warned on deep recursion and never came back up out of the stack
| [reply] |
|
|
"the referenced object has been blessed into a package, then that package name is returned instead.
Switching from ref to Scalar::Util::reftype() would fix that.
and ::Worksheet warned on deep recursion and never came back up out of the stack
Using Scalar::Util::refaddr() with a %seen hash could prevent it recursing up its own bum.
The code I posted was by way of example. Just a blind tweak of a depth-first recursive structure traversal I had kicking around.
I'd have a go a making the above tweaks if I had a suitable example .xls file.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] [select] |
|
|
|
|