skelooth has asked for the wisdom of the Perl Monks concerning the following question:

Hello all. I have recently been porting OLE based perl scripts to a Unix environment and have hit a snag when dealing with Spreadsheet::ParseExcel. The module returns a maximum row of 2954 when some of the spreadsheets I need to parse are well above 5,000 rows.

I'm hoping someone knows of a fix or work around. I have tried ignoring the MaxRow but the module will not retrieve values beyond that point. I've also snooped around the module its self and I am unable to find this value hard coded in anywhere.

I am open (and grateful) for any unix compatable work around, but preferably perl. I am not sure what is dictating the row number cap and was surprised to find no results on google of people with similiar problems.

Update, this is some of the parsing code

for($r = $wks->{MinRow}; $wks->{MaxRow} && $r <= $wks->{MaxRow}; $ +r++) { if(cell($wks,$r,0) !~ /(group|category|project|client)/i) + { next; } if(cell($wks,$r,0) =~ /group/i) { $group= cell($wks,$r,1 +); next; } if(cell($wks,$r,0) =~ /category/i) { $category= cell($wks,$r,1 +); next; } if(cell($wks,$r,0) =~ /project/i) { $project= cell($wks,$r,1) +; $client= "0"; } if(cell($wks,$r,0) =~ /client/i) { $client= cell($wks,$r,1) +; }
# Utility function to safely look inside of a Cell sub cell { my ($Wks, $r, $c)= @_; if (defined $Wks->{Cells}[$r][$c]) { my $val= $Wks->{Cells}[$r][$c]->Value; $val =~ s/ \(/\(/gi; $val =~ s/formerly/Frm\./gi; $val =~ s/\s\s/ /gi; return $val; } return ""; }

Update 3 days later

One of the perlmonks was good enough to give me some code on his scratch pad that seemed to work, but now I have a WEIRD problem! Note the following code:

#!/usr/bin/perl -w use strict; use Spreadsheet::ParseExcel; my $parse_excel = Spreadsheet::ParseExcel->new(); my $workbook1= $parse_excel->Parse('data/DASNYFootnotes.xls'); print "Max row : ", 1 + ${$workbook1->{Worksheet}}[0]->{MaxRow +}, "\n"; my $workbook2= $parse_excel->Parse("data/dasnyFootnotes.xls"); print "Maxrow b: ", 1 + ${$workbook2->{Worksheet}}[0]->{MaxRow +}, "\n";

Gives the output

Max row : 3249
Maxrow b: 2955

and if I switch the order, maxrow b is still only 2955!!! I don't understand? They look identical

It was my own stupid fault! Some how two files had the same name with different case in my data folder! Sorry!

Replies are listed 'Best First'.
Re: Spreadsheet::ParseExcel MaxRow is maxing out at 2954
by kyle (Abbot) on Apr 29, 2008 at 15:52 UTC

    I don't have any experience with Spreadsheet::ParseExcel, but I wonder what's on the last row that you can parse successfully and what's on the row after it. Maybe your problem is caused by the contents of the file breaking the module in some way. I'd open up the file and look in that area for something unusual.

Re: Spreadsheet::ParseExcel MaxRow is maxing out at 2954
by derby (Abbot) on Apr 29, 2008 at 15:56 UTC

    kyle makes a good point and I would start there ... but you don't show how you set your workseet. Could there be multiple sheets in the excel file and you're getting the wrong one?

    #!/usr/bin/perl use strict; use warnings; use Spreadsheet::ParseExcel; my $book = Spreadsheet::ParseExcel::Workbook->Parse( "file.xls" ); foreach my $sheet ( @{$book->{Worksheet}} ) { print "SHEET Name: ", $sheet->{Name}, "\n"; print "SHEET MaxRow: ", $sheet->{MaxRow}, "\n"; }

    -derby

      The following is a copy paste of the spreadsheet near line 2954. I have tried adding and deleting rows to see if it effects the cap, but it still stops at 2954.

      436	82995	Western New York Catholic Long-Term Care, Inc., Series 1993
      		current refunded on May 5, 2004 by mortgage proceeds sales as follows:
      		$10,045,000 called on 6/17/04 @ 101%
      		
      437	82996	Wartburg Home of Evangelical Lutheran Church, Series 1993
      		current refunded on April 28, 2004 by mortgage proceeds sales as follows:
      		$210,000 called 8/1/04 @ par.
      		$15,060,000 called 8/1/04 @ 101%
      ***line 2954 here****		
      438	82997	Elizabeth Church Manor NH Series 1993 current refunded on September 25, 2003
      		by IDA bonds as follows:
      		$6,725,000 called 10/27/03 @ 102%.
      

      As for setting the spreadsheet, here is a more complete code fragment

      my $bk= $Excel->Parse("data/dasnyFootnotes.xls"); my($r, $c, $wks); $wks= $bk->{Worksheet}[0]; my $entry=0; for($r = $wks->{MinRow}; cell($wks,$r,0) !~ /^end$/i; $r++) { if(cell($wks,$r,0) =~ /^(\d+)$/) { #print cell($wks,$r,0)."\n"; $entry=$1; $html[$entry]->[0]= cell($wks,$r,1); $html[$entry]->[1]= ""; # pre init avoids warnings } if($entry > 0 && cell($wks,$r,2) ne "") { $html[$entry]->[1].= cell($wks,$r,2)."<br />"; } }
Re: Spreadsheet::ParseExcel MaxRow is maxing out at 2954
by jethro (Monsignor) on Apr 29, 2008 at 17:44 UTC
    How big is your spreadsheet when you cap it at line 2954? Maybe something near a power of 2?

    Since Excel file format might not be line oriented (I see lots of unpack statements in the parser source), the limitation might be in bytes. Do other excel files stop at exactly the same line ?

    You might edit some lines in the spreadsheet to be much smaller and see if this lets you see more lines. If yes, then there might be (for example) pointers/numbers of double length in the spreadsheet that the parser doesn't know about or just calculates wrong.