in reply to rowspan vals in Excel SS are not interpreted by ParseExcel::Simple

Simple does not do spanning... Now the Spreadsheet::ParseExcel data is still accessible in Spreadsheet::ParseExcel::Simple

use Spreadsheet::ParseExcel::Simple; use Data::Dumper; my $xls = Spreadsheet::ParseExcel::Simple->read('excel.xls'); foreach my $sheet ($xls->sheets) { die Dumper $sheet; #while ($sheet->has_data) { # my @data = $sheet->next_row; #} }

Take a look at the fields AlignH, AlignV and Merged. If Merged=1 you can have a value or no value.

So, in order to keep on using Spreadsheet::ParseExcel::Simple I added a small unmerge() function that removes the span's:

use Spreadsheet::ParseExcel::Simple; my $xls = Spreadsheet::ParseExcel::Simple->read('excel.xls'); foreach my $sheet ($xls->sheets) { #unmerge unmerge($sheet->{sheet}); while ($sheet->has_data) { my @data = $sheet->next_row; print join("\t", @data),"\n"; } } sub unmerge{ my($oWkS) =@_; print "--------- SHEET:", $oWkS->{Name}, "\n"; for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) + { for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxC +ol} ; $iC++) { $oWkC = $oWkS->{Cells}[$iR][$iC]; next unless $oWkC; if($oWkC->is_merged){ if($oWkC->Value){ print "COPY V=$oWkC->{Format}->{AlignV} H=$oWkC->{ +Format}->{AlignH}\n"; for my $i (1..$oWkC->{Format}->{AlignV}){ print "ADDV [$iR+$i][$iC] \n"; $oWkS->{Cells}[$iR+$i][$iC]->{_Value} = $oWkC- +>Value; $oWkS->{Cells}[$iR+$i][$iC]->{Merged}=0; } for my $i (1..$oWkC->{Format}->{AlignH}){ print "ADDH [$iR][$iC+$i] \n"; $oWkS->{Cells}[$iR][$iC+$i]->{_Value} = $oWkC- +>Value; $oWkS->{Cells}[$iR][$iC+$i]->{Merged}=0; } } } print "( $iR , $iC ) ==", $oWkC->Value, " ", $oWkC->{Merge +d},"\n"; } } }

This should copy/clone your spanned data... while still being able to use Simple...

in my case, my output was:

--------- SHEET:Sheet1 COPY V=2 H=0 ADDV [0+1][0] ADDV [0+2][0] ( 0 , 0 ) ==cat 1 ( 0 , 1 ) ==Fluffy ( 1 , 0 ) ==cat 0 ( 1 , 1 ) ==Vera ( 2 , 0 ) ==cat 0 ( 2 , 1 ) ==Manxie cat Fluffy cat Vera cat Manxie

edit: can you send me your xls example where the cell is at the second row, Libreoffice forces the data from the rightmost/topmost...

edit2: Ok, merged cells that already had a value, and my subroutine dies because the AlignH/AlignV are messed up. This will need some more tinkering before it works properly. (but need to sleep now)

Replies are listed 'Best First'.
Re^2: rowspan vals in Excel SS are not interpreted by ParseExcel::Simple
by FreeBeerReekingMonk (Deacon) on Feb 09, 2017 at 08:34 UTC
    Used poj's knowhow to make it work. (what was I thinking yesterday evening, AlignH is for indentation, not merged cells). Code is smaller to boot!

    use Spreadsheet::ParseExcel::Simple; my $xls = Spreadsheet::ParseExcel::Simple->read('excel.xls'); foreach my $sheet ($xls->sheets) { #unmerge unmerge($sheet->{sheet}); while ($sheet->has_data) { my @data = $sheet->next_row; print join("\t", @data),"\n"; } } sub unmerge{ my($oWkS) =@_; print "--------- SHEETNAME:", $oWkS->{Name}, "\n"; my $range = $oWkS->get_merged_areas(); for (@$range){ print " start row : $_->[0] start col : $_->[1] end row : $_->[2] end col : $_->[3] \n\n"; $oWkC = $oWkS->{Cells}[$_->[0]][$_->[1]]; for my $iR ($_->[0] .. $_->[2]){ for my $iC ($_->[1] .. $_->[3]){ next if ($iR eq $_->[0] && $iC eq $_->[1]); print "SET [$iR][$iC] <= [$_->[0]][$_->[1]]\n"; $oWkS->{Cells}[$iR][$iC]->{_Value} = $oWkC->Value; } } } print "--------- DONE Unmerging.\n"; }

    Output:

    --------- SHEETNAME:Sheet1 start row : 0 start col : 0 end row : 2 end col : 0 SET [1][0] <= [0][0] SET [2][0] <= [0][0] --------- DONE Unmerging. cat Fluffy cat Vera cat Manxie

    Caveat: The {_Value} is still not guaranteed to be in the topleft cell from a range (as misterperl had it in the second column), so the code might need to be expanded to first SEARCH for a defined {_Value} in the range. Then have that as the source data cell, then iterate over the range writing all other cells that are undefined...

    You can access the functions from within Simple like:

    $sheet->{sheet}->get_merged_areas();

Re^2: rowspan vals in Excel SS are not interpreted by ParseExcel::Simple
by misterperl (Friar) on Feb 09, 2017 at 13:57 UTC
    Thank You, great example! From the hints here I wrote this short solution (using the non-simple ParseExcel) :
    # fill in merged cells my $ma = $page->{sheet}->{MergedArea}; for my $a ( @$ma ) { my ( $tr, $lc, $br, $rc ) = @$a; # propogate values top left to bottom right my $p = $page->{sheet}->{Cells}->[$tr]->[$lc]->{Val}; for ( my $row=$tr; $row<=$br; $row++ ) { for ( my $col=$lc; $col<=$rc; $col++ ) { $page->{sheet}->{Cells}->[$row]->[$col]->{Val}=$p; } } } # for $a
    Seems to do the trick! I've seen some cases where the cell hash also has another Val which is something like _Val. I was never sure why those were there but I've seen examples where it contains the actual val. So in that case, $p may have to be assigned differently... Thanks all. I was floundering until you pointed out all the "merged" stuff. Seems like an odd term- I would have thought they'd use "span" not "merge"..
      Ah, when I was reading the code, I found this explanation about Val and _Value in ~/perl5/lib/perl5/Spreadsheet/ParseExcel/Cell.pm

      ############################################################ # # value() # # Returns the formatted value of the cell. # sub value { my $self = shift; return $self->{_Value}; } ############################################################ # # unformatted() # # Returns the unformatted value of the cell. # sub unformatted { my $self = shift; return $self->{Val}; } ############################################################

      You see, sometimes a cell has a formula, and it holds the formula string, but also a resulting value. So Val holds =sum(A1+1) and _Value holds "42". It was/is? a problem in a XLS writer module, where only one was filled in, so you were required to open the spreadsheet in excel, let it calculate the values, then save it again in order to keep on processing the sheet.

Re^2: rowspan vals in Excel SS are not interpreted by ParseExcel::Simple
by misterperl (Friar) on Feb 09, 2017 at 15:26 UTC
    ..oddly enough, after pushing the value into the {Val} and {_Value} fields, I figured I was good to go. But I then used:
    my @row=$page->next_row;
    and all of the vals I propagated in are missing.

    Yet if I inspect the cell Val at the row and col, they are all correct. So I had to manually push values into the array:

    for ( my $c=0; $c<=$maxCol; $c++ ) { next if $row[$c] =~ /./; $row[$c]=$page->{sheet}->[$row]->[$c]->{Val}; }
    Sure didn't expect that! * * * Followup- not sure why I was seeing that- but now everything is behaving without that step.. YAY thanks guys you were all a great help.
      See also previous explanation:

      The perl module uses the function Value() to retrieve the value from the variable {_Value}. So setting {Val} will not work with next_row().

      sub next_row { map { $_ ? $_->Value : "" } @{$_[0]->{sheet}->{Cells} }

      Now, because a spanned cell COULD hold a value() BEFORE spanning (usually empty, but not necessarily), you will get that value() instead of the spanned value...

      spanning is like mounting a drive over a directory that may already hold files.... but it works now, congrats!