in reply to Re^2: Out of Memory
in thread Out of Memory

I think this way.

given that all you need is to get XLSX data - probably to get some kind of DB values - then, just do not use this CPAN module, just extract your data yourself,
be your-application-centric.

But before doing that, drop a letter to CPAN author - it could be that he is responsive and will provide you with a solution soon.
Otherwise - just reuse his code of "unzipping" the content, and then use your own regular expression.

But better than that - feed you resulting XML string into properly constructed XPATH expression - and feed this XPATH expression to Xml::LibXML - it is very efficient on XPath expressions, but other modules dealing with XPath also will suffice.
(I - personally - have good experience with mentioned one, and TIMTOWTDI)

This would be best way out of this situation - this is how I feel it.

Is 40Mb - a size of ZIPped XLSX, or it is a size after unpacking?

Replies are listed 'Best First'.
Re^4: Out of Memory
by ETLTCHFIG (Novice) on May 05, 2011 at 19:45 UTC
    40MB is the Zipped XLSX file size - thanks for the guidance vkon! I am trying to use the code in the new method of the XLSX CPAN module and stick my functionality in there so I dont have to store anything in memory
Re^4: Out of Memory
by ETLTCHFIG (Novice) on May 05, 2011 at 20:31 UTC
    OK - I THINK I GOT THE PROBLEM - In XLSX.pm the method new issues the following read foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/g) { This is trying to cache the whole worksheet at a time; one of my worksheets in the XLSX file that is throwing the "Out of Memory" has a million rows Is there a way I can change this to perform line by line reading?
      optimizing such kind of constructs - is one of the ways in your situation.

      The answer to your question - 'yes',