in reply to memory usage Spreadsheet::ParseExcel

It's a known issue that S::PE objects have cyclic references, so objects never get completely freed. The simplest solution is to create each object in a separate process (i.e. fork, create object, process, end child process). The other issue of parsing the entire document to memory as you've found is solved by using the CellHandler and NotSetCell attributes (or with Spreadsheet::ParseExcel::Stream), but it is not guaranteed that the cells are parsed "in order." They seem to be parsed in order when the spreadsheet is created by Excel, but not, e.g., when the spreadsheet is created by Spreadsheet::WriteExcel without using 'compatibility_mode' and the rows/cells are written out of order.

  • Comment on Re: memory usage Spreadsheet::ParseExcel

Replies are listed 'Best First'.
Re^2: memory usage Spreadsheet::ParseExcel
by Anonymous Monk on May 14, 2012 at 11:34 UTC

    Thanks for the input. Spreadsheet::ParseExcel is a good module and gets the job done. Thanks for that, too.

    For those that follow, I experimented writing to a DBM:: Deep but is slow while DBI->connect is faster when AutoCommit => 0 but that uses memory, too.

    The strategy that I followed was to write files of sheet cells in the cell_handler and then read these into a hash to get the cells row and column sequential. A 131MB xls parse and the sheet hash would fit into 4GB of memory, barely. Now if the xls is much bigger or the sheets are much bigger, I will have to fork the parsing to a separate child process in order to free up the memory used by the parser. A 131MB xls consisting of one sheet ain't going to fit in 4GB in one process.

    Now the simplest solution is to increase the computers available core and fight the out of memory battle with $27. But alas, the liberals and the marxists in the Whitehouse have gotten this economy so screwed up with dreams of green energy that I cannot afford buying a new computer (mine is at its 4GB limit) and pay for food,shelter,clothing, and taxes. I will of course proceed to proceed!

      Did you try parsing using the CellHandler and NotSetCell attributes, and find that they were parsed out of order? You don't say whether you tried or not, you just seem to fear that they might be parsed out of order. You should try it before you reject the notion. As I said earlier, they will likely be parsed in order unless the spreadsheet was created by something like Spreadsheet::WriteExcel and the cells were written out of order.