Parsing horizontal table in Excel & loading it into a database

simak29 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,

I've been playing around with Spreadsheet::ParseExcel and have had success extracting info from a vertical table. I am now looking at extracting info from a horizontal table which is not completely bound. So was hoping someone here has some advice for me:

My scenario is:

I have an excel spreadsheet with 3 sheets. Sheet1,Salary has the below:

Name, Salary in columns B & C. Extending 10 rows.

Name,   Mon-DD
Monica    104,000
Meredith  110,000
Brian     92,000
Madeline  112,0000
Averie    140,0000
[download]

The above is pretty straightforward to extract

The same spreadsheet has the below data

 Col E through J

Row 6 Col E   ColF     ColG     ColH   ColI     Col J
Row 7         Monica   Meredith Brian  Madeline Averie
Row 8         104,000  110,000   92,000  112,000  140,000
Row9
Row 10 Salary  100,000  105,000   85,000 110,000  125,000
Row 11 Salary%  75%      71%       80%    81%     69%
Row 12
Row 13 Bonus    4,000    5,000    7,000   2,000    15,000
Row 14 Bonus%    10%      20%     10%     2%       14%
[download]

Now my questions, what is the best way I can parse this data using SpreadSheet::ParseExcel and store the results in a table of the below structure. How do I make sure that the Salary, Salary%, Bonus, Bonus% are tying to the same name?

 Name  Total_Salary Salary Salary%  Bonus Bonus%
       Monica
       Meredith
       Brian
       Madeline
       Averie
[download]

I'm hoping to do this without using any VBA or Excel specific tools. Any guidance using this library will be greatly appreciated. I apologize in advance if the formatting seems off or if the question itself sounds vague. Thanks for your help in advance

Regards, Simak

Comment on Parsing horizontal table in Excel & loading it into a database Select or Download Code

Replies are listed 'Best First'.

Re: Parsing horizontal table in Excel & loading it into a database
by ww (Archbishop) on Jun 10, 2015 at 03:25 UTC

Need clarification: "Salary in columns B & C."

Why is the salary (which appears to be a single, 5 or 6 digit number, associated with each name) extended across two columns? Where do names go? Col A or Col B? Or do you mean the names go in B and the Salaries in C? In that case, what's in Col A?

Second clarification needed: your question indicates headers in both data sections, but since there are five names, then for the data in the second data section -- the one you say appears in "Col E through J" and should begin in row 6 -- there can be no headers OTHER THAN THE NAMES. Is this your intent?

HINT: you can use <table>, <tr>, and <td> markup to make your data render in a manner that provides a clear indication of your intent. See Markup in the Monastery ... and unless you're proficient with html markup, you should probably test your efforts in your scratchpad, rather than in a new note, and especially rather than by trying to update your existing post.

WAG: Are you trying to convey something like this?

Col A? Monica 104

Col A? Meredith 110

Col A? Brian 92

Col A? Madeline 112

Col A? Averie 140

Col A? ColB ColC ColD Monica Meredith Brian Madeline Averie

Col A? ColB ColC ColD 104 110 92 112 140

Salary ColB ColC ColD 100 105 85 110 125

Salary% ColB ColC ColD 75.00% 71.00% 80.00% 81.00% 69.00%

Bonus ColB ColC ColD 4 5 7 2 15

Bonus% ColB ColC ColD 10% 20% 10% 2% 14%

[reply]

Re: Parsing horizontal table in Excel & loading it into a database
by marinersk (Priest) on Jun 10, 2015 at 03:39 UTC

Transpose the loops for row and column. The rest should be more or less the same as your current code.

[reply]

Re: Parsing horizontal table in Excel & loading it into a database
by locked_user sundialsvc4 (Abbot) on Jun 10, 2015 at 13:41 UTC

Perhaps stating it slightly-differently: first, walk along the names-column-header row, looking to find the first row that is blank, or maybe, the last one that is not. (If you detect at this point that there are no names, or that there are blank columns within the list, die().) Now, you know the range of columns to be processed.

I would also examine the row-header columns, either to locate those rows or to die() if the spreadsheet is not exactly as expected. Now, you know, or have confirmed, which rows you need to look at. The necessary loop naturally follows.

FYI... I always code such programs very suspiciously. “Trust, but verify.™” Spreadsheet users can do anything they wish, and sometimes do, even by mistake. This can cause a subsequent computer program to commit the worst possible sin: to “appear to work correctly,” but to produce the wrong results. No one will know. Therefore, a little extra time spent verifying the expected content of rows and columns, by means of regular expressions, with die() being the response if anything appears not-right, makes for a much more robust and reliable program. If such a program runs to completion (and if its various tests have actually been tested!), then it serves as a positive indicator that its outputs (and, its inputs) are plausible.

Re^2: Parsing horizontal table in Excel & loading it into a database

by marinersk (Priest) on Jun 10, 2015 at 17:39 UTC

I agree with the Constructive Paranoia.

Being of an engineering mindset, I am even cautious when there is "no way" the data could be laid out incorrectly.

I've lost count of the number of times in my career my code has caught errors in an error-resistant environment, and handled it with some kind of grace (even yelling for help is more graceful than wandering off the edge of the Abyss in many cases), and I've had other folks asking how on earth I saw that coming. Simple: I didn't. I simply asked that most basic of engineering questions during the design phase: What could go wrong?

In an open-ended system such as this problem describes, Constructive Paranoia is absolutely warranted. I would almost say it is mandatory, but most folks have exceptions in their heads -- and, yes, Mike, it would not surprise me in the slightest if you were not amongst them :: grin ::.

The code can't react with judgement the way a human does (for better or for worse!). So you have to code all the edge and corner cases, and do your best to deal with the black hole cases as well. Otherwise, it will almost certainly come back to haunt you.

[reply]