Direct to spreadsheet

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a question regarding the use of writing directly to a spreadsheet (using Spreadsheet::WriteExcel). At the moment I have DBI/DBD function which extracts data to an intermediate file. The file is opened and some parsing is done before applying to a spreadhseet (writing to an xls file). I'd like to try and get rid of the intermediate step of writing to the file by writing directly to the spreadhseet (.xls) but am not sure how to do this ?
Here's a snippet of what I do currently .

 # Open the output file for the cmdtxt audit trail selected
open TXT,">$txt" or die "Can't create $txt : $!\n";

# execute the sql statement
$sth=$dbh->prepare("@sql");
$sth->execute;
while ($data = $sth->fetchrow_arrayref()) {
     chomp $data->[13];
   $data->[13] =~ tr/\t//; # get rid of the tabs
          if ( $data->[4] > 1) {
         print TXT "$data->[13]";
      } else {
         print TXT  "\nBOR $data->[3]|$data->[2]|$data->[9]|$data->[10
+]|$data->[4]|$data->[13]";
      }
}
$sth->finish;
$dbh->disconnect;
close TXT;

# Create the spreadsheet and set up formats
$workbook  = Spreadsheet::WriteExcel->new("my.xls");
# Set up various workbook formats here....
# Create some worksheets
         $sheet1 = $workbook->add_worksheet('Tools');
         $sheet2 = $workbook->add_worksheet('Client');
         $sheet3 = $workbook->add_worksheet('Fares');
# create some column headings for each worksheet here...

# Set up some lookup lists using Regexp::List here...

# Read the intermediate file and populate the worksheets 
$/="BOR ";
open TXT,"<$txt" or die "Can't open $txt : $!\n";
while( <TXT> ) {
    chomp;
    if( m[/\-] ) {
        $_ .= <TXT> until m[\-/];
        s[ \s? / \- .+? \- / \s? ][]smgx;# cut out some unwanted txt
        chomp;
    }
   my @rec = split /\|/;
      $aa = $rec[0];
      $bb = $rec[1];
      $cc = $rec[2];
      $dd = $rec[3];
      # $rec[4] isn't used here
      $ee = $rec[5];

    if ($ee =~ /$regexp1/) {
       popxls($sheet1,$normal,$darow); # populate worksheet 1
       $darow++;
    } elsif ($ee =~ /$regexp2/) {
       popxls($sheet2,$normal,$arow); # populate worksheet 2
       $arow++;
    } elsif ($ee =~ /$regexp3/) {
       popxls($sheet3,$normal,$srow); # populate worksheet3
       $srow++;
    }
}
close TXT;

sub popxls {
               $col = 0;
               my $type = $_[0];
               my $fmt = $_[1];
               my $row = $_[2];

               $type->write($row, $col, "$ee", $fmt); $col++;
               $cmdtype->write($row, $col, $aa, $fmt); $col++;
               $cmdtype->write($row, $col, "$bb", $fmt); $col++;
               $cmdtype->write($row, $col, "$cc", $fmt); $col++;
               $cmdtype->write($row, $col, "$dd", $fmt);
}
[download]

It seems inefficient to me to write to an intermediate file then reopen this file and do some more processing on it before writing to the spreadsheet file. Can this be done in memory instead ? if so how ?

Thanks in advance

Comment on Direct to spreadsheet Download Code

Replies are listed 'Best First'.
Re: Direct to spreadsheet by McDarren (Abbot) on Jan 23, 2006 at 10:58 UTC
Well, to answer your question first (and it's a little bit difficult without seeing some sample data) - I'd probably suggest using something like `selectall_hashref` and load all of your data into a hashref. You could then write directly to the Excel worksheets. A few comments about your code: `# execute the sql statement $sth=$dbh->prepare("@sql");` [download] ... looks a little odd to me. It seems to suggest that you are passing a list of statements, rather than a single one? Also, you appear to be selecting at least 14 rows of data, yet you only use 6 of them. What is the point of that? Are you doing `select * from ...` or something? I strongly suspect that your code could be significantly simplified by improving your SQL select statement to only return the data that you actually need to use. Perhaps you could post that along with a sample set of data? Cheers, Darren :) Update: One thing that I forgot to mention is that your choice of variable names makes your code a little difficult to follow. Stuff like `$aa = $rec[0];` doesn't really tell you much about the data. It's a personal thing, but I usually try to make my variable names as descriptive as possible - makes it easier for the next person who comes along and has to debug/maintain my code (usually me) :) That's one reason why I like to use `selectall_hashref` on the results of my DB queries. I can then refer to the data in a very descriptive way.	[reply] [d/l] [select]
Re^2: Direct to spreadsheet by Anonymous Monk on Jan 23, 2006 at 11:40 UTC
Thanks for the comments. I'd like to answer your queries if I may. The @sql is used because a sql string which consists of many parts including a "select * from" is built further up the code. The sql has been tuned for maximum efficiency, it's the Perl bit which hasn't. The code shown is a small subsection of the whole story. Apologies if I didn't make this clear. This is my first post on this site I don't use `selectall_hashref` because I found that for larger datasets this significantly reduces the performance of returning the dataset required. I appreciate that cosmetically the code is much more readable using the `selectall_hashref` and for smaller datasets I do use this.	[reply] [d/l] [select]
Re^3: Direct to spreadsheet by McDarren (Abbot) on Jan 23, 2006 at 15:08 UTC
The @sql is used because a sql string which consists of many parts including a "select * from" is built further up the code. Okay, but it still doesn't make sense. `$sth->prepare` accepts just a single parameter (an SQL statement string). So as far as I can see, the only way your code can possibly work is if you are passing a single element list. In which case, you should be assigning your SQL string to a scalar variable (ie. $sql). However, that doesn't really address your current problem. If reading your entire result set into a hashref isn't an option, then you probably want a LoL In any case, your current use of ~~`selectall_arrayref`~~ `fetchrow_arrayref` is un-necessary. You could simply do something like: `while (@row = $sth->fetchrow_array) {` [download] ...and then the line `print TXT "\nBOR $data->[3]\|$data->[2]\|$data->[9]\|$data->[10]\|$data-> +[4]\|$data->[13]";` [download] ..simply becomes `print TXT "\nBOR ", join("\|", @data[3,2,9,10,4,13]);` [download] ..or as GrandFather suggested `push @data, "\nBOR ", join("\|", @data[3,2,9,10,4,13]);` [download] Apologies if I'm not appearing very helpful. But as I said earlier, I'm finding it difficult to visualise the solution without actually seeing what your data looks like. Perhaps some more experienced monk will chip in and offer their advice :) Cheers, Darren :)	[reply] [d/l] [select]
Re^4: Direct to spreadsheet by Anonymous Monk on Jan 23, 2006 at 16:03 UTC
Re^5: Direct to spreadsheet by McDarren (Abbot) on Jan 23, 2006 at 17:22 UTC
Re: Direct to spreadsheet by GrandFather (Saint) on Jan 23, 2006 at 10:48 UTC
A pretty light weight way of doing it is to replace your `print TXT` lines with `push @data,` and your `while (<TXT>)` with `for (@data)`. You need to add a `my @data;` line and remove all the `TXT` related stuff. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^2: Direct to spreadsheet by Anonymous Monk on Jan 23, 2006 at 13:56 UTC
I tried this but must be doing something wrong. I think the problem surrounds this statement. `for (@data2) { chomp; if( m[/\] ) { $_ .= <TXT> until m[\/]; s[ \s? / \* .+? \* / \s? ][]smgx; # Get rid of comments, we ar +en't interested in checking comment conten t chomp; }` [download] I'm not sure how to change this (the TXT bit) to reflect the fact I'm using an array instead ?	[reply] [d/l]
Re^3: Direct to spreadsheet by GrandFather (Saint) on Jan 23, 2006 at 17:51 UTC
Sorry, didn't notice that wrinkle the first time through. `<TXT>` needs to disappear - you are no longer using a file remember. I'd rework it to shift the lines out of the array bthus: `while (@data2) { $_ = shift @data2; if( m[/\] ) { $_ .= shift @data2 until m[\/] or ! @data2; s[ \s? / \* .+? \* / \s? ][]smgx; } }` [download] Note that the `chomp`s are not required because they weren't added to the lines in the array. Note also the check for end of data in the until. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]