content triggered parsing in Spreadsheet-ParseExcel

qingxia has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monkers;

I did not get much luck from stackoverflow so i decide to raise up here. Please move to the original link :

http://stackoverflow.com/questions/15598273/parsing-an-xls-file-using-perl

i think amon provides a doable code except that the data does not have the repeated pattern he assumed as every fourth row (maybe he missed what i replied). Basically, not just the 3 fields that i would like to extract, it has more fields. Also it does not have a repeated pattern like the same block appears every 5 rows or so. It is irregular in a sense that each item (in this case, school) has unequal sub-items.

e.g.

          col1      col2
    row1  School    1
    row2  Dean      John
    row3  No.stu.   55
    row4  some irrelevant stuff 
    row5  School2   2
    row6  Dean      Tony 
    row7  No. stu.  60 
    row8  some irrelevant stuff
    row9  School    3
    row10 Dean      James
    row11 No.stu.   56
    row12 No. teacher 20
[download]

Now the idea is that, although the data does not have a regular pattern, for each block, the last row would be the same item, say, average SAT score. I was wondering if it is possible to have a content-triggered code which parse back all stuff it goes through until reaches some specific content.

Best regards,

Comment on content triggered parsing in Spreadsheet-ParseExcel Download Code

Replies are listed 'Best First'.
Re: content triggered parsing in Spreadsheet-ParseExcel by hdb (Monsignor) on Mar 25, 2013 at 12:10 UTC
Here is a version that is more dynamic but assumes data is columns A and B starting from row 1: `School 1 Dean John No.stu. 55 School 2 Dean Tony No. Students 60 School 3 Dean James No.stu. 56 No. Teacher 20` [download] Most of the code is taken from amon's example on stackoverflow.com use strict; use warnings; use Spreadsheet::ParseExcel; my ($infile) = @ARGV; my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse($infile); die $parser->error unless defined $workbook; my ($worksheet) = $workbook->worksheets(); my %data; # accumulate data here my $row = 0; my $school = 0; while(1){ my $cell = $worksheet->get_cell($row, 0); last unless defined($cell); my $key = $cell->value(); my $data = $worksheet->get_cell($row++, 1)->value(); if( $key eq "School" ) { $school = $data; } else { $data{$school}{$key} = $data; } } # see what we got foreach my $s (sort keys %data) { print "School $s:\n"; foreach my $fact (sort keys %{$data{$s}}) { print "\t$fact: $data{$s}{$fact}\n"; } } [download] which will print `School 1: Dean: John No.stu.: 55 School 2: Dean: Tony No. Students: 60 School 3: Dean: James No. Teacher: 20 No.stu.: 56` [download] If you now add more facts underneath a school it will automatically add it to the hash.	[reply] [d/l] [select]
Re^2: content triggered parsing in Spreadsheet-ParseExcel by qingxia (Novice) on Mar 27, 2013 at 09:41 UTC
Hi hdb, thanks very much for the code. i learned a lot from it but still have some questions. Most of them may seem straightforward to you but I hope get them clarified. 1. `last unless defined($cell);` i think it is used to tell the loop when to stop, e.g. when the loop stops when it reaches the undefined cell. 2. `my $data = $worksheet->get_cell($row++, 1)->value();` I guess this line increments the row number by one AFTER it fetches the column value. 3. `if( $key eq "School" ) { $school = $data; } else { $data{$school}{$key} = $data; }` [download] this i am not sure, but it seems to me that the hash table %data has 2 layers, first is school, the second contains the rest facts. If it reaches the $school row, record it as first layer key. And record the rest as other keys? Please correct me. Thanks a lot in advance!	[reply] [d/l] [select]
Re^3: content triggered parsing in Spreadsheet-ParseExcel by hdb (Monsignor) on Mar 27, 2013 at 09:46 UTC
No corrections required. You got it all correct.	[reply]
Re^4: content triggered parsing in Spreadsheet-ParseExcel by qingxia (Novice) on Mar 27, 2013 at 09:54 UTC
Re^4: content triggered parsing in Spreadsheet-ParseExcel by qingxia (Novice) on Mar 27, 2013 at 13:16 UTC
Re^2: content triggered parsing in Spreadsheet-ParseExcel by qingxia (Novice) on Mar 25, 2013 at 22:08 UTC
wow, it works really well. thanks a lot.	[reply]