qingxia has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monkers;

I did not get much luck from stackoverflow so i decide to raise up here. Please move to the original link :

http://stackoverflow.com/questions/15598273/parsing-an-xls-file-using-perl

i think amon provides a doable code except that the data does not have the repeated pattern he assumed as every fourth row (maybe he missed what i replied). Basically, not just the 3 fields that i would like to extract, it has more fields. Also it does not have a repeated pattern like the same block appears every 5 rows or so. It is irregular in a sense that each item (in this case, school) has unequal sub-items.

e.g.

col1 col2 row1 School 1 row2 Dean John row3 No.stu. 55 row4 some irrelevant stuff row5 School2 2 row6 Dean Tony row7 No. stu. 60 row8 some irrelevant stuff row9 School 3 row10 Dean James row11 No.stu. 56 row12 No. teacher 20

Now the idea is that, although the data does not have a regular pattern, for each block, the last row would be the same item, say, average SAT score. I was wondering if it is possible to have a content-triggered code which parse back all stuff it goes through until reaches some specific content.

Best regards,

Replies are listed 'Best First'.
Re: content triggered parsing in Spreadsheet-ParseExcel
by hdb (Monsignor) on Mar 25, 2013 at 12:10 UTC

    Here is a version that is more dynamic but assumes data is columns A and B starting from row 1:

    School 1 Dean John No.stu. 55 School 2 Dean Tony No. Students 60 School 3 Dean James No.stu. 56 No. Teacher 20

    Most of the code is taken from amon's example on stackoverflow.com

    use strict; use warnings; use Spreadsheet::ParseExcel; my ($infile) = @ARGV; my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse($infile); die $parser->error unless defined $workbook; my ($worksheet) = $workbook->worksheets(); my %data; # accumulate data here my $row = 0; my $school = 0; while(1){ my $cell = $worksheet->get_cell($row, 0); last unless defined($cell); my $key = $cell->value(); my $data = $worksheet->get_cell($row++, 1)->value(); if( $key eq "School" ) { $school = $data; } else { $data{$school}{$key} = $data; } } # see what we got foreach my $s (sort keys %data) { print "School $s:\n"; foreach my $fact (sort keys %{$data{$s}}) { print "\t$fact: $data{$s}{$fact}\n"; } }

    which will print

    School 1: Dean: John No.stu.: 55 School 2: Dean: Tony No. Students: 60 School 3: Dean: James No. Teacher: 20 No.stu.: 56

    If you now add more facts underneath a school it will automatically add it to the hash.

      Hi hdb,

      thanks very much for the code. i learned a lot from it but still have some questions. Most of them may seem straightforward to you but I hope get them clarified.

      1. last unless defined($cell); i think it is used to tell the loop when to stop, e.g. when the loop stops when it reaches the undefined cell.

      2. my $data = $worksheet->get_cell($row++, 1)->value(); I guess this line increments the row number by one AFTER it fetches the column value.

      3.

      if( $key eq "School" ) { $school = $data; } else { $data{$school}{$key} = $data; }

      this i am not sure, but it seems to me that the hash table %data has 2 layers, first is school, the second contains the rest facts. If it reaches the $school row, record it as first layer key. And record the rest as other keys? Please correct me.

      Thanks a lot in advance!

        No corrections required. You got it all correct.

      wow, it works really well. thanks a lot.