dwhalen1 has asked for the wisdom of the Perl Monks concerning the following question:

I use the below code to read spread sheets. I am being inputed a line number that the header starts on. but I found that many of the input spreadsheets have a few to many blank rows lines starting at the top. there will be free form cells filled in but these are above the real header. So when I try to count the number rows down from top the count is off by the top blank rows and I donot know how to count them. example a spread sheet starts with 5 total blank rows and I am given 20 as the line # the real header start on... the real count i find to be 15 because the top 5 don't get seen in code below. Is there a answer to true row counting thanks David --------------------
use Spreadsheet::BasicRead; my $ss = new Spreadsheet::BasicRead($xlsFileName) || die "Could not o +pen '$xlsFileName': $!"; while (my $data = $ss->getNextRow()) { $row++; $d = "$row "; print join('|', $d, @$data), "\n"; }

Replies are listed 'Best First'.
Re: reading xls
by roboticus (Chancellor) on Mar 18, 2014 at 11:04 UTC

    dwhalen:

    Usually I simply read rows until I find the header, something like:

    while (my $data = $ss->getNextRow()) { # skip lines until we find column header "Last Name" in first colum +n next if $data[0] eq 'Last Name'; ... stuff ... }

    If your data doesn't *have* any headers, just some blank rows at the top, you can do basically the same thing--skip rows until you get a line that's not blank:

    while (my $data = $ss->getNextRow()) { # skip lines until we find a non-blank row next unless grep { ! /^\s+$/ } @$data; ... stuff ... }

    Or, if your data may have arbitrary junk before it, but your data is always the same number of columns, you can delete rows that have the wrong number of columns:

    while (my $data = $ss->getNextRow()) { # skip lines unless they have the right number of columns next unless $expected_columns == @$data; ... stuff ... }

    You'll want to take some precautions to do extra checking, though, because these methods can be fragile. Even if the spreadsheets are computer-generated and have the same format, someone's going to munge one before sending it to you at some point, and you don't want to have to figure out the problem later. Instead, be sure to reject the spreadsheet with an appropriate message if you can.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: reading xls
by tosaiju (Acolyte) on Mar 18, 2014 at 11:04 UTC
    if I got it correctly - I think you can try the foreach as in Spreadsheet::XLSX module documentation to traverse thru each row.
    foreach $row ($sheet -> {MinRow}+1 .. $sheet -> {MaxRow})

    http://search.cpan.org/~dmow/Spreadsheet-XLSX-0.13-withoutworldwriteables/lib/Spreadsheet/XLSX.pm
Re: reading xls
by locked_user sundialsvc4 (Abbot) on Mar 18, 2014 at 14:47 UTC

    Usually, I prefer to use OLE to open and manage spreadsheets using Excel, instead of trying to parse the file itself.

    It is also very desirable if there are (or, could be), named cell-ranges within the spreadsheet document, so that it basically self-identifies what areas within the sheet or the workbook are of interest.

    If you can’t arrange for any of these things, the program simply has to be able to iterate through the rows or columns, looking (through tests of several cells) for “what looks like data.”   Normally, I will design the program so that it inspects the file first, sniffing for problems so that if the wrong file has been provided or it doesn’t contain good data, nothing will be loaded from it.