bitingduck has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, I have a problem that isn't necessarily a Perl problem, so much as it manifests itself when I'm using Perl, and I'm hoping that there may be a relatively painless Perl solution.

For many years I've been getting monthly data files in Excel from a little company in Seattle. For years I've parsed them into a database, using first Spreadsheet::ParseExcel, then as the format changed using Spreadsheet::XLSX, and more recently using Spreadsheet::ParseXLSX. And despite the changes from .xls to .xlsx and a few format changes, things have been mostly good.

Some time around a year ago the format didn't change, but the Spreadsheet::ParseXLSX stopped reading the file correctly. Or maybe more accurately, whoever wrote the file stopped writing it correctly, and now Spreadsheet::ParseXLSX doesn't read it correctly. The particular issue where it first manifests is in reading a cell with "Month Year" in the contents (e.g. "December 2023"). When I read the cell with

 my $mycell=$worksheet->get_cell();

and then try to evaluate it with either

my $value=$mycell->value(); # or my $unformatted=$mycell->unformatted();

it returns "0" or "0.0" respectively. The catch is that if I simply open the file in Excel and hit "save", or open it in Apple's Numbers and export it back to XLSX, it then parses correctly, like it always used to. So the Excel file opener is fairly forgiving of whatever changed, but Spreadsheet::ParseXLSX isn't, and I don't see an obvious way to get at the content to debug. Using Data::Dumper on $mycell also shows zeros where it should give the month and year. I haven't tried other cells - once it barfs, I remember I have to open/save and that presumably fixes all the cells.

This is something I can live with if I have to, since I only have to deal with the files a few times a year and can just open/save them quickly, but it seems so wrong. I'm not sure it's a widespread problem - someone else is writing the file using a program other than Excel so I can read it with a program other than Excel. The smart thing would be if the generator produced CSV, but that's not something I can influence.

Replies are listed 'Best First'.
Re: Spreadsheet::ParseXLSX returning zero
by swl (Prior) on Dec 05, 2023 at 08:30 UTC

    This might not work but what do the cell values look like when the file is parsed using Excel::ValueReader::XLSX?

    If you parse a lot of XLSX files and only need the values then Excel::ValueReader::XLSX is substantially faster at parsing files. Recent versions of Spreadsheet::Read are also able to use it as a backend if you want to be able to switch between parsers.

      Well that seems to have worked!

      Thanks!

      I didn't know about that module, and given how much else it installed, it appears to be relatively independent of Spreadsheet::(.*)

      It looks like I have some rewriting to do - I'll come back and post if I can figure out what the source of the problem with the .xlsx file is.

Re: Spreadsheet::ParseXLSX returning zero
by soonix (Chancellor) on Dec 05, 2023 at 09:16 UTC
    Several years ago, I had a similiar problem, because the spreadsheed module I used at the time was not able to handle background colo(u)rs. My solution involved creating a macro in OpenOffice, and in the Perl script I had
    system ('C:\Program Files (x86)\OpenOffice 4\program\scalc.exe', '-headless', 'macro:///Standard.Module1.ConvertToHtml("' . $xlsx . '","' . +$file->canonpath . '")') if ! $file->exists;
    and using the resulting file. Sadly, I don't have the macro anymore, but I remember it was very short (some "open" and "save as"), similiar to simply open and hit "save".

    I don't remember why I used OpenOffice instead of Excel. Perhaps it was easier to call the parameterized macro from Perl, but maybe not...

      You may have used OpenOffice instead of Excel because some older version of Excel (2008?) was prone to handling dates wrong. That's a different problem that I've run into with Excel in the past and worked around.
        Ah, yes, for the optimist the glass is half full, for the pessimist it's half empty, and for Excel it's January 2nd (or February 1st, depending on locale)...