divyahk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am a newbie to Perl and have a problem with parsing an Excel file. I am using Spreadsheet::XLSX for reading data from an Excel file and extract each row of data, one row at a time. The code (partial) is as follows:

use strict; use warnings; use Spreadsheet::Read; use Data::Dumper; my $excel = Spreadsheet::XLSX->new (‘xyz.xlsx'); if (!defined $excel) { die "Error: " . $excel->error() . "\n"; } print “Read with Spreadsheet::XSLX\n"; print Dumper($excel) . “\n";

The output here shows that it’s reading the correct workbook as it can recognise the correct worksheet called “Paper.1”, and I think I am using the correct version of the Excel parser as it is indeed the .xlsx variety of Excel. Why are the cells not being read then? (As shown by cells =>undef)

Read with Spreadsheet::XLSX : $VAR1 = bless( { 'Worksheet' => [ bless( { 'DefColWidth' => '8.43', 'MinCol' => 0, 'MaxRow' => 0, 'MinRow' => 0, 'path' => 'worksheets/sheet +1.xml', 'MaxCol' => 0, 'Name' => 'Paper.1', 'Cells' => undef }, 'Spreadsheet::ParseExcel:: +Worksheet' ) ], 'Flg1904' => 0, 'FmtClass' => bless( {}, 'Spreadsheet::XLSX::Fmt2007' + ), 'SheetCount' => 1 }, 'Spreadsheet::ParseExcel::Workbook' );

Replies are listed 'Best First'.
Re: Spreadsheet::XSLX parsing workbook, but not retrieving values from cells
by Tux (Canon) on Sep 17, 2014 at 12:38 UTC

    Spreadsheet::Read is a wrapper module that does not parse XLSX format itself, but delegates that task to a dedicated parser. Though for this purpose Spreadsheet::XLSX is supported, it is highly recommended to (install and) use Spreadsheet::ParseXLSX instead: Spreadsheet::XLSX is considered buggy and is not maintained anymore and Spreadsheet::ParseXLSX is.

    Try to make the switch, read the docs of the modules, and when it still doesn't work, come back for advice. NOBODY will be able to help you fix bugs in Spreadsheet::XLSX. (well, maybe somebody might be able to, but it would be a complete waste of time).


    Enjoy, Have FUN! H.Merijn
Re: Spreadsheet::XLSX parsing workbook, but not retrieving values from cells
by Athanasius (Cardinal) on Sep 17, 2014 at 11:47 UTC

    Hello divyahk, and welcome to the Monastery!

    There are two problems with your code as posted:

    1. You have use Spreadsheet::Read; instead of use Spreadsheet::XLSX;.

    2. In various places single- and double-quote characters are non-ASCII and so not recognised by the Perl parser.

    With these issues fixed, your code works fine for me:

    Perhaps the file being read is actually empty after all?

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hi Athanasius, Thanks for the reply, but the code still doesn't work for me.

      1. According to this page at CPAN

      "Spreadsheet::Read tries to transparently read *any* spreadsheet and return its content in a universal manner independent of the parsing module that does the actual spreadsheet scanning.

      For OpenOffice and/or LibreOffice this module uses Spreadsheet::ReadSXC

      For Microsoft Excel this module uses Spreadsheet::ParseExcel, Spreadsheet::ParseXLSX, or Spreadsheet::XLSX.

      For CSV this module uses Text::CSV_XS or Text::CSV_PP.

      For SquirrelCalc there is a very simplistic built-in parser"

      So, I think it wouldn't matter if I used Spreadsheet::Read instead of Spreadsheet::XLSX. In any case, I tried changing the use statement, and it still doesn't make a difference. The cells are still not read.

      2. Can you please explain what you mean when you say single- and double-quote characters are non-ASCII characters? Do you mean I should check if there are quotes in the Excel file itself? If so, no, there aren't any characters within quotes in the file.

      3. I'm pretty sure the file is not empty, and I verified it by opening it, with the Excel application.

        But when using Spreadsheet::Read, you should also use the methods that Spreadsheet::Read provides and not use the methods the undelying parser might use. That defeats the purpose of the wrapper.

        First install Spreadsheet::ParseXLSX and then read the file using the right syntax:

        use strict; use warnings; use Spreadsheet::Read; my $excel = ReadData ("xyz.xlsx"); defined $excel or die; print "Read with ", $excel->[0]{parser}, "\n";

        Regarding the quotes, that might be a cut-n-paste (windows) problem. The post of your code shows the use of "smart"-quotes “ U0201c \N{LEFT DOUBLE QUOTATION MARK} instead of plain duoble-quote ".


        Enjoy, Have FUN! H.Merijn
Re: Spreadsheet::XSLX parsing workbook, but not retrieving values from cells
by Anonymous Monk on Sep 17, 2014 at 11:27 UTC

    Does the "quick & dirty" parser not have a debug|verbose mode to show what it is doing? Or, use debugger or print statements to trace the parsing/errors.

    If the file was not empty, then there may be some problem being ignored.