simak29 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,

I've been playing around with Spreadsheet::ParseExcel and have had success extracting info from a vertical table. I am now looking at extracting info from a horizontal table which is not completely bound. So was hoping someone here has some advice for me:

My scenario is:

I have an excel spreadsheet with 3 sheets. Sheet1,Salary has the below:

Name, Salary in columns B & C. Extending 10 rows.

Name, Mon-DD Monica 104,000 Meredith 110,000 Brian 92,000 Madeline 112,0000 Averie 140,0000

The above is pretty straightforward to extract

The same spreadsheet has the below data

Col E through J Row 6 Col E ColF ColG ColH ColI Col J Row 7 Monica Meredith Brian Madeline Averie Row 8 104,000 110,000 92,000 112,000 140,000 Row9 Row 10 Salary 100,000 105,000 85,000 110,000 125,000 Row 11 Salary% 75% 71% 80% 81% 69% Row 12 Row 13 Bonus 4,000 5,000 7,000 2,000 15,000 Row 14 Bonus% 10% 20% 10% 2% 14%

Now my questions, what is the best way I can parse this data using SpreadSheet::ParseExcel and store the results in a table of the below structure. How do I make sure that the Salary, Salary%, Bonus, Bonus% are tying to the same name?

Name Total_Salary Salary Salary% Bonus Bonus% Monica Meredith Brian Madeline Averie

I'm hoping to do this without using any VBA or Excel specific tools. Any guidance using this library will be greatly appreciated. I apologize in advance if the formatting seems off or if the question itself sounds vague. Thanks for your help in advance

Regards, Simak

Replies are listed 'Best First'.
Re: Parsing horizontal table in Excel & loading it into a database
by ww (Archbishop) on Jun 10, 2015 at 03:25 UTC

    Need clarification: "Salary in columns B & C."

    Why is the salary (which appears to be a single, 5 or 6 digit number, associated with each name) extended across two columns? Where do names go? Col A or Col B? Or do you mean the names go in B and the Salaries in C? In that case, what's in Col A?

    Second clarification needed: your question indicates headers in both data sections, but since there are five names, then for the data in the second data section -- the one you say appears in "Col E through J" and should begin in row 6 -- there can be no headers OTHER THAN THE NAMES. Is this your intent?

    HINT: you can use <table>, <tr>, and <td> markup to make your data render in a manner that provides a clear indication of your intent. See Markup in the Monastery ... and unless you're proficient with html markup, you should probably test your efforts in your scratchpad, rather than in a new note, and especially rather than by trying to update your existing post.

    WAG: Are you trying to convey something like this?

    Col A?Monica104
    Col A?Meredith 110
    Col A?Brian 92
    Col A?Madeline 112
    Col A?Averie 140
    Col A?ColBColCColDMonicaMeredithBrianMadelineAverie
    Col A?ColBColCColD10411092112140
    SalaryColBColCColD10010585110125
    Salary%ColBColCColD75.00%71.00%80.00%81.00%69.00%
     
    BonusColBColCColD457215
    Bonus%ColBColCColD10%20%10%2%14%
Re: Parsing horizontal table in Excel & loading it into a database
by marinersk (Priest) on Jun 10, 2015 at 03:39 UTC
    Transpose the loops for row and column. The rest should be more or less the same as your current code.
Re: Parsing horizontal table in Excel & loading it into a database
by locked_user sundialsvc4 (Abbot) on Jun 10, 2015 at 13:41 UTC

    Perhaps stating it slightly-differently:   first, walk along the names-column-header row, looking to find the first row that is blank, or maybe, the last one that is not.   (If you detect at this point that there are no names, or that there are blank columns within the list, die().)   Now, you know the range of columns to be processed.

    I would also examine the row-header columns, either to locate those rows or to die() if the spreadsheet is not exactly as expected.   Now, you know, or have confirmed, which rows you need to look at.   The necessary loop naturally follows.

    FYI...  I always code such programs very suspiciously.   “Trust, but verify.™”   Spreadsheet users can do anything they wish, and sometimes do, even by mistake.   This can cause a subsequent computer program to commit the worst possible sin:   to “appear to work correctly,” but to produce the wrong results.   No one will know.   Therefore, a little extra time spent verifying the expected content of rows and columns, by means of regular expressions, with die() being the response if anything appears not-right, makes for a much more robust and reliable program.   If such a program runs to completion (and if its various tests have actually been tested!), then it serves as a positive indicator that its outputs (and, its inputs) are plausible.

      I agree with the Constructive Paranoia.

      Being of an engineering mindset, I am even cautious when there is "no way" the data could be laid out incorrectly.

      I've lost count of the number of times in my career my code has caught errors in an error-resistant environment, and handled it with some kind of grace (even yelling for help is more graceful than wandering off the edge of the Abyss in many cases), and I've had other folks asking how on earth I saw that coming. Simple: I didn't. I simply asked that most basic of engineering questions during the design phase: What could go wrong?

      In an open-ended system such as this problem describes, Constructive Paranoia is absolutely warranted. I would almost say it is mandatory, but most folks have exceptions in their heads -- and, yes, Mike, it would not surprise me in the slightest if you were not amongst them :: grin ::.

      The code can't react with judgement the way a human does (for better or for worse!). So you have to code all the edge and corner cases, and do your best to deal with the black hole cases as well. Otherwise, it will almost certainly come back to haunt you.