Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello there. I need to seek the Wisdom of a monk with knowledge of Excel. I'm using the Spreadsheet::ParseExcel ::Simple module to search through a fairly straightforward Excel spreadsheet. The problem comes when I encounter a blank cell. I want to read the contents of the cell into a simple list array, based on whether or not it is empty. If the cell is empty, I want to insert an "X" instead. However, despite stripping the contents of "empty" cells of all whitespace, carriage returns and new line characters, *sometimes* Perl still seems to think there is something in it. If I go into Excel and manually change the font on the empty cells, the problem goes away. Also, the cells that seem to evaluate incorrectly are always the last item in the current row. Any help with this problem would be appreciated. I think it's something to do with an invisible control character evaluating as a non-whitespace character for some reason. Here is a snippet of code:
$excelfile = "/data/MS/excel/contract_check.xls"; $xls = Spreadsheet::ParseExcel::Simple->read($excelfile); @sheets = $xls->sheets; while ($sheets[1]->has_data) { @hsbcdata = (); @hsbcdataread = $sheets[1]->next_row; foreach $_ (@hsbcdataread) { s/^\s*//; s/\s*\r*$//; chomp ($_); if (/\S+/) { push (@hsbcdata, $_); } else { push (@hsbcdata, "X"); # } } ... }

Replies are listed 'Best First'.
Re: Perl and Excel
by Enlil (Parson) on Jan 13, 2003 at 21:05 UTC
    you might change this:
    if (/\S+/){ . . .}
    to something that matches what will exist in the cell. So instead of checking to see if it has at least non-space character, you can check to make sure it has one of whatever character might be in there. If you are dealing strictly with numbers:
    if (/[0-9]/){ ... }
    would work. or if dealing with numbers and letters and an underscore:
    if (/[\w]/) {...}
    On another note the + is ambiguous as it will only be true if there is at least one of whatever character you are matching anyway.

    update: much to my chagrin. Yes \d is the same as [0-9] and the brackets are not necessary with \w or the \d for that matter. Thank you anonymous monk.

    -enlil

      [0-9] is the same as \d [\w] the brackets arent necesary, use just \w instead
      Many thanks for your help guys, unfortunately the problem remains. I tried the \w character, and with and without +. Curiouser and curiouser.
        Eureka! I have since found the problem is due to the fact that  @hsbcdataread = $sheets[1]->next_row is returning an array of less than the expected size of 6 columns. For some reason, sometimes it returns six columns where one is empty, and other times it just returns 5 columns instead, meaning that the foreach loop is not always entered into for the sixth column in my Excel spreadsheet. A quick and messy if statement has solved this problem. As is often the case, the problem was not what it initially appeared to be. Many thanks to those who contributed. New Code:
        while ($sheets[1]->has_data) { @hsbcdata = (); @hsbcdataread = $sheets[1]->next_row; $size = @hsbcdataread; if ($size < 6) { $hsbcdataread[5] = "X"; } foreach $_ (@hsbcdataread) { s/^\s*//; s/\s*\r*$//; chomp ($_); if (/\S+/) { push (@hsbcdata, $_); } else { push (@hsbcdata, "X"); } } . . }
Re: Perl and Excel
by jmcnamara (Monsignor) on Jan 14, 2003 at 00:33 UTC

    The Spreadsheet::ParseExcel::Simple next_row() method subclasses the Spreadsheet::ParseExcel Cell->Value() method to retrieve data from an Excel row.

    If a cell is blank then Spreadsheet::ParseExcel::Cell::Value returns an undef which Spreadsheet::ParseExcel::Simple::next_row turns into a empty string for convenience.

    So far so good. Where I think the problem arises is between what you perceive as a blank cell and what Excel perceives as a blank cell.

    Excel differentiates between an empty cell and a blank cell. An empty cell is a cell which doesn't contain data whilst a blank cell is a cell which doesn't contain data but does contain formatting. Excel stores blank cells but ignores empty cells.

    To add to the confusion a cell that was formatted at some stage but now has its formatting removed may still have some internal flags set that aren't visible to the user. This would result in blank cell instead of an empty cell. You can remove these using Edit->Clear->All in Excel.

    As such all of your s/// and chomp code is to no avail. Probably the best thing to do is just remove any empty strings from the end of the array returned by next_row():

    @hsbcdataread = $sheets[1]->next_row(); pop @hsbcdataread while @hsbcdataread and $hsbcdataread[-1] eq "";

    --
    John.