Andre_br has asked for the wisdom of the Perl Monks concerning the following question:

Hello my code-friends,

I am trying to make Perl read xls files and save them as a tsv .txt file(wich will be target of many checks, reads and at last transfer to mysql). But I am having a hard time with worksheets bigger than 5 thousand lines. With them the script just crashes and stops answering - with the others it goes ok, even that it is not fast at all.

Anyone of you has experience with large data manipulation in this task? Does the module in fact have this problem? Hereīs my code:

my $oExcel = new Spreadsheet::ParseExcel; my $file = "test4.xls"; my $oBook = $oExcel->Parse($file); my($iR, $iC, $oWkS, $oWkC); my @thisrow; # to store the row for latter processing # just the first worksheet $oWkS = $oBook->{Worksheet}[0]; for ( $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow}; $iR++) { @thisrow = (); for ( $iC = $oWkS->{MinCol}; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol}; $iC++) { $oWkC = $oWkS->{Cells}[$iR][$iC]; if ($oWkC) { push (@thisrow, $oWkC->Value); print $thisrow[0]; # this last one ir just to test; here would # go the print to the .txt file handler } } }

I also know that I could use the unix shell to do that, but I asked the guys at my host and they said they never heard of a command line converter for that. Donīt know if it was the guy that didnīt know, or that itīs realy absent there...

Thanks a lot,

André

Replies are listed 'Best First'.
Re: Spreadsheet::ParseExcel crashes above 5K lines
by samtregar (Abbot) on Jan 09, 2005 at 17:20 UTC
    What exactly do you mean by "crashes and stops answering"? Do you see an error message?

    Have you tried running the script under the Perl debugger (see perldebug)? Try interupting it when the problem occurs to see what it's doing.

    Are the values for minRow, maxRow, minCol and maxCol correct? It looks to me like enough rows with bad or missing maxCol values could cause a significant pause in output.

    How much memory is the script using? You can get a rough idea by watching 'top' in a separate window while the script runs.

    -sam

Re: Spreadsheet::ParseExcel crashes above 5K lines
by jmcnamara (Monsignor) on Jan 10, 2005 at 00:14 UTC

    The problem probably isn't with the number of lines/rows, as such, but with the number of cells. Spreadsheet::ParseExcel creates a largish object for each data or formatted cell that it encounters. This generates a large memory overhead which can cause your program to grind to a halt. If you monitor your system memory while the program is running you can verify this.

    See Reducing the memory usage of Spreadsheet::ParseExcel for a workaround.

    --
    John.

      Hey John,

      I read your post about the reducing memory usage by avoiding the formating data. In fact, thatīs a great idea, and I really donīt want the formating data in these worksheets. But Iīm having trouble in fitting the code into mine. Can you or any of the fellow monks take a look?

      #!/usr/bin/perl -w use CGI qw( :standard); use Spreadsheet::ParseExcel; # The new objects my $q = new CGI; my $oExcel = Spreadsheet::ParseExcel->new( CellHandler => \&cell_handl +er, NotSetCell => 1 ); # McNamara's alternative cell handler sub cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; print $cell->{Val} , "<br>"; } # The header print $q->header( -type => "text/html"); # Starting the cronometer $start = (times) [0]; # The reading # my $file = "z4kjasdf.xls"; my $file = "acervo15k.xls"; # my $file = "teste1.xls"; my $oBook = $oExcel->Parse($file); my($iR, $iC, $oWkS, $oWkC); my @thisrow; # Just the first worksheet... $oWkS = $oBook->{Worksheet}[0]; for ( $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS- +>{MaxRow}; $iR++) { @thisrow = (); for ( $iC = $oWkS->{MinCol}; defined $oWkS->{MaxCol} && $iC <= $oWk +S->{MaxCol}; $iC++) { $oWkC = $oWkS->{Cells}[$iR][$iC]; if ($oWkC) { push (@thisrow, $oWkC->{Val}); } } my ($col1, $col2, $col3, $col4, @othercols) = @thisrow; print "$col1 <br>"; } # Ending the cronometer $end = (times) [0]; print "<b>$iR rows read in " , $end-$start , " seconds";
      Iīm trying to read a 3,7 Mb worksheet with 15 thousand lines. My development station(wagon!) is a AMDK62 300 mhz with 96 Mb RAM.

      Thanks!

      André