in reply to Speeding up Spreadsheet::XLSX file load in UNIX

I think one of the reasons Spreadsheet::XLSX is so slow, is that it doesn't use a proper XML parser, but parses the workbook(s) using regular expressions. And over that, it uses:

use Archive::Zip; use Spreadsheet::XLSX::Fmt2007; use Data::Dumper; use Spreadsheet::ParseExcel;

to be Spreadsheet::ParseExcel compatible (which it really is not.

In most Spreadsheet modules, the whole spreadsheet (file) is read into memory, as there are several formats to be parsed before one can get to the actual data (ZIP, binary, ...). If the spreadsheet would be readable directly from file (like CSV, if you want to call that a spreadsheet), parsing could be a lot faster.

If someone would (re)write this module using a proper (fast) XML parser, preferably with the option to select whatever (working) XML parser is installed, that would really help this module. I really mean option here, as making the module require XML::libXML would mean its death, as XML::libXML depends on libxml2, which might prove very hard to port on some non-standardish systems. So the module should choose between XML::libXML, XML::Parser, XML::Parser::Lite, XML::Simple, or XML::Twig (and even those might he depending on each other).


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: Speeding up Spreadsheet::XLSX file load in UNIX
by jmcnamara (Monsignor) on Jul 05, 2011 at 11:49 UTC
    If someone would (re)write this module using a proper (fast) XML parser, preferably with the option to select whatever (working) XML parser is installed, that would really help this module

    I plan to write an Excel::Reader::XLSX module once Excel::Writer::XLSX reaches full compatibility with Spreadsheet::WriteExcel (in 2-3 months).

    The main aim will be a parser that is fast and has low memory usage. It will probably be based around XML::Twig.

    --
    John.

      YEAH! jmcnamara++. Once you have a prototype working, I'd like to check it in Spreadsheet::Read (and support it there).


      Enjoy, Have FUN! H.Merijn