ginju has asked for the wisdom of the Perl Monks concerning the following question:

Hello Great Monks,

I need to convert some excel data into HTML.

What I Have:

Tasklist1.xls, Tasklist2.xls and Milestone.xls

What I need to do:

I need to open and read the Milestone.xls file and get the Milestones and the required tasks for each milestone (This file already has this information). Then, Refer these tasks in Tasklist1.xls and Tasklist2.xls to get the start and end dates (The tasks can be from either of the tasklist files). Then I need to create a table in HTML listing the Milestone, The tasks for each milestone and the start and end dates for each of the tasks. Thus creating a "schedule" for wach Milestone.

Thus the HTML looks like:
Milestone1 Blah Blah    
Task1 Blah Blah 06/07/04 06/10/04
Task2 Blah Blah 06/10/04 06/12/04

So on and so forth...

Any help would be greatly appreciated.

Wisdom Seeker

Replies are listed 'Best First'.
Re: EXCEL TO HTML Conversion w/Perl
by CountZero (Bishop) on Jun 07, 2004 at 21:59 UTC
    Depending on how your Excel spreadsheets are formated, you could try to get the data out of them through DBI and DBD:Excel.

    If the spreadsheets cannot be used as a database, you will have to use Spreadsheet::ParseExcel which offers a rather low level of access to the spreadsheet. There are some more sophisticated wrappers around this module, so have alook at CPAN!

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      I just did a search and other than DBD::Excel (which you mentioned), I only found Spreadsheet::ParseExcel::Simple which is very high level indeed and hasn't been updated since Dec 2002 so I don't know if it still works. Are there other modules you commonly use? If so, I'd love to know since I hate dealing with Excel at such a low level.
        I try to do most of my Excel-parsing through DBI or not at all. For one-off jobs I sometimes just dump the spreadsheet to a CSV-file and work with that.

        I find spreadsheets too much "free format" and contain too much "eye candy" for PHB's to work with. Most of the time I go the other way around: from a database to a spreadsheet (to give to my database-challenged colleagues).

        That being said, I should just give Spreadsheet:ParseExcel::Simple or its underlying Spreadsheet:ParseExcel a whirl. Maybe these modules are fully developped and did not need updating since 2002! (yeah yeah and maybe the moon is made out of green cheese)

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: EXCEL TO HTML Conversion w/Perl
by davidj (Priest) on Jun 07, 2004 at 22:05 UTC
    If you provide a sample of what the spreadsheets look like, that is, how the data is represented in each one, it would be easier to help you with some code options.

    davidj

      Hi David,

      This is how my excel work books look like:

      Milestone.xls

      MilestoneID Task IDs
      Milestone1 Task1a, Task1b, Task2a, Task2c
      Milestone2 Task1c, Task2a, Task1d, Task2b

      Where the Tasks come from TaskList1.xls and Tasklist2.xls. These files look like this (as an example snippet from TaskList1.xls):

      TaskID Task Description Start Date End Date
      Task1a Task1a Description 06/10/04 06/12/04
      Task1b Task1b Description 06/13/04 06/15/04

      So on and so forth...

      What I need to do is, open and read the Milestone.xls to get the Milestone and the task IDs. Then query these task IDs in the TaskList1.xls and Tasklist2.xls and then generate a "schedule" table for each milestone by getting the start and end dates for each task.

      I am looking for either a code snippet or an "efficient" way to do this.

      One suggestion I got was to read the Tasklist1 and Tasklist2 excel files into memory (hash table??) and then query since the Task IDs are unique.

      Any help would be greatly appreciated.

      Thank you very much.

      Wisdom Seeker, Andy

        This isn't very elegant (and maybe not too efficient), but it will store and extract the data in a way that is usable. I don't do html work, so you are on your own for figuring that part out :)
        Note: it assumes that in milestones.xls, all the taskID's are stored in a single cell.

        use Spreadsheet::ParseExcel::Simple; my (%milestones, %taskID); my $xls = Spreadsheet::ParseExcel::Simple->read('milestones.xls'); foreach my $sheet ($xls->sheets) { while ($sheet->has_data) { my @data = $sheet->next_row; next if $data[0] =~ m/ID/; next if $data[0] eq ''; my @tasks = split(",", $data[1]); foreach my $task (@tasks) { $milestones{$data[0]} = [ @tasks ]; } } } my $xls = Spreadsheet::ParseExcel::Simple->read('TaskLists.xls'); foreach my $sheet ($xls->sheets) { while ($sheet->has_data) { my @data = $sheet->next_row; next if $data[0] =~ m/ID/; next if $data[0] eq ''; my @tasks = @data[2..3]; $taskID{$data[0]} = [ @tasks ]; } } foreach my $milestone (keys %milestones) { my @tasks = @{ $milestones{$milestone} }; foreach my $task (@tasks) { print "milestone = $milestone, id = $task, start = @{ $taskID{ +$task} }[0], stop = @{ $taskID{$task} }[1]\n"; } } exit;