in reply to Re: Out of Memory
in thread Out of Memory

Ok - I am looking at XLSX to see if I can make changes to remove stuff I dont need - since all i am trying to do in the production code is read the cell value and write it out to csv file

Can I eliminate all that excel style processing?
my $member_styles = $self -> {zip} -> memberNamed ('xl/styles. +xml'); my @styles = (); my %style_info = (); if ($member_styles) { foreach my $t ($member_styles -> contents =~ /xf\ numF +mtId="([^"]*)"(?!.*\/cellStyleXfs)/gsm) { #" # $t = $converter -> convert ($t) if $converter +; push @styles, $t; } my $default = $1 || ''; foreach my $t1 (@styles){ $member_styles -> contents =~ /numFmtId="$t1" formatCode=" +([^"]*)/; my $formatCode = $1 || ''; if ($formatCode eq $default || not($formatCode)){ if ($t1 == 9 || $t1==10){ $formatCode="0.00000%";} elsif ($t1 == 14){ $formatCode="m-d-yy";} else { $formatCode=""; } } $style_info{$t1} = $formatCode; $default = $1 || ''; } }

Also, I am considering reducing the cell object size by removing format and type hash keys So change

} my $cell =Spreadsheet::ParseExcel::Cell->new( Val => $v, Format => $thisstyle, Type => $type );
TO
} my $cell =Spreadsheet::ParseExcel::Cell->new( Val => $v );
Please let me know what you think

Replies are listed 'Best First'.
Re^3: Out of Memory
by vkon (Curate) on May 05, 2011 at 19:26 UTC
    I think this way.

    given that all you need is to get XLSX data - probably to get some kind of DB values - then, just do not use this CPAN module, just extract your data yourself,
    be your-application-centric.

    But before doing that, drop a letter to CPAN author - it could be that he is responsive and will provide you with a solution soon.
    Otherwise - just reuse his code of "unzipping" the content, and then use your own regular expression.

    But better than that - feed you resulting XML string into properly constructed XPATH expression - and feed this XPATH expression to Xml::LibXML - it is very efficient on XPath expressions, but other modules dealing with XPath also will suffice.
    (I - personally - have good experience with mentioned one, and TIMTOWTDI)

    This would be best way out of this situation - this is how I feel it.

    Is 40Mb - a size of ZIPped XLSX, or it is a size after unpacking?

      40MB is the Zipped XLSX file size - thanks for the guidance vkon! I am trying to use the code in the new method of the XLSX CPAN module and stick my functionality in there so I dont have to store anything in memory
      OK - I THINK I GOT THE PROBLEM - In XLSX.pm the method new issues the following read foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/g) { This is trying to cache the whole worksheet at a time; one of my worksheets in the XLSX file that is throwing the "Out of Memory" has a million rows Is there a way I can change this to perform line by line reading?
        optimizing such kind of constructs - is one of the ways in your situation.

        The answer to your question - 'yes',