Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a question regarding the use of writing directly to a spreadsheet (using Spreadsheet::WriteExcel). At the moment I have DBI/DBD function which extracts data to an intermediate file. The file is opened and some parsing is done before applying to a spreadhseet (writing to an xls file). I'd like to try and get rid of the intermediate step of writing to the file by writing directly to the spreadhseet (.xls) but am not sure how to do this ?
Here's a snippet of what I do currently .

# Open the output file for the cmdtxt audit trail selected open TXT,">$txt" or die "Can't create $txt : $!\n"; # execute the sql statement $sth=$dbh->prepare("@sql"); $sth->execute; while ($data = $sth->fetchrow_arrayref()) { chomp $data->[13]; $data->[13] =~ tr/\t//; # get rid of the tabs if ( $data->[4] > 1) { print TXT "$data->[13]"; } else { print TXT "\nBOR $data->[3]|$data->[2]|$data->[9]|$data->[10 +]|$data->[4]|$data->[13]"; } } $sth->finish; $dbh->disconnect; close TXT; # Create the spreadsheet and set up formats $workbook = Spreadsheet::WriteExcel->new("my.xls"); # Set up various workbook formats here.... # Create some worksheets $sheet1 = $workbook->add_worksheet('Tools'); $sheet2 = $workbook->add_worksheet('Client'); $sheet3 = $workbook->add_worksheet('Fares'); # create some column headings for each worksheet here... # Set up some lookup lists using Regexp::List here... # Read the intermediate file and populate the worksheets $/="BOR "; open TXT,"<$txt" or die "Can't open $txt : $!\n"; while( <TXT> ) { chomp; if( m[/\-] ) { $_ .= <TXT> until m[\-/]; s[ \s? / \- .+? \- / \s? ][]smgx;# cut out some unwanted txt chomp; } my @rec = split /\|/; $aa = $rec[0]; $bb = $rec[1]; $cc = $rec[2]; $dd = $rec[3]; # $rec[4] isn't used here $ee = $rec[5]; if ($ee =~ /$regexp1/) { popxls($sheet1,$normal,$darow); # populate worksheet 1 $darow++; } elsif ($ee =~ /$regexp2/) { popxls($sheet2,$normal,$arow); # populate worksheet 2 $arow++; } elsif ($ee =~ /$regexp3/) { popxls($sheet3,$normal,$srow); # populate worksheet3 $srow++; } } close TXT; sub popxls { $col = 0; my $type = $_[0]; my $fmt = $_[1]; my $row = $_[2]; $type->write($row, $col, "$ee", $fmt); $col++; $cmdtype->write($row, $col, $aa, $fmt); $col++; $cmdtype->write($row, $col, "$bb", $fmt); $col++; $cmdtype->write($row, $col, "$cc", $fmt); $col++; $cmdtype->write($row, $col, "$dd", $fmt); }
It seems inefficient to me to write to an intermediate file then reopen this file and do some more processing on it before writing to the spreadsheet file. Can this be done in memory instead ? if so how ?

Thanks in advance

Replies are listed 'Best First'.
Re: Direct to spreadsheet
by McDarren (Abbot) on Jan 23, 2006 at 10:58 UTC
    Well, to answer your question first (and it's a little bit difficult without seeing some sample data) - I'd probably suggest using something like selectall_hashref and load all of your data into a hashref. You could then write directly to the Excel worksheets.

    A few comments about your code:

    # execute the sql statement $sth=$dbh->prepare("@sql");
    ... looks a little odd to me. It seems to suggest that you are passing a list of statements, rather than a single one?

    Also, you appear to be selecting at least 14 rows of data, yet you only use 6 of them. What is the point of that? Are you doing select * from ... or something?

    I strongly suspect that your code could be significantly simplified by improving your SQL select statement to only return the data that you actually need to use. Perhaps you could post that along with a sample set of data?

    Cheers,
    Darren :)

    Update: One thing that I forgot to mention is that your choice of variable names makes your code a little difficult to follow. Stuff like $aa = $rec[0]; doesn't really tell you much about the data. It's a personal thing, but I usually try to make my variable names as descriptive as possible - makes it easier for the next person who comes along and has to debug/maintain my code (usually me) :)

    That's one reason why I like to use selectall_hashref on the results of my DB queries. I can then refer to the data in a very descriptive way.

      Thanks for the comments. I'd like to answer your queries if I may.
      The @sql is used because a sql string which consists of many parts including a "select * from" is built further up the code. The sql has been tuned for maximum efficiency, it's the Perl bit which hasn't. The code shown is a small subsection of the whole story. Apologies if I didn't make this clear. This is my first post on this site
      I don't use  selectall_hashref because I found that for larger datasets this significantly reduces the performance of returning the dataset required. I appreciate that cosmetically the code is much more readable using the  selectall_hashref and for smaller datasets I do use this.
        The @sql is used because a sql string which consists of many parts including a "select * from" is built further up the code.

        Okay, but it still doesn't make sense. $sth->prepare accepts just a single parameter (an SQL statement string). So as far as I can see, the only way your code can possibly work is if you are passing a single element list. In which case, you should be assigning your SQL string to a scalar variable (ie. $sql).

        However, that doesn't really address your current problem. If reading your entire result set into a hashref isn't an option, then you probably want a LoL

        In any case, your current use of selectall_arrayref fetchrow_arrayref is un-necessary. You could simply do something like:

        while (@row = $sth->fetchrow_array) {
        ...and then the line
        print TXT "\nBOR $data->[3]|$data->[2]|$data->[9]|$data->[10]|$data-> +[4]|$data->[13]";
        ..simply becomes
        print TXT "\nBOR ", join("|", @data[3,2,9,10,4,13]);
        ..or as GrandFather suggested
        push @data, "\nBOR ", join("|", @data[3,2,9,10,4,13]);

        Apologies if I'm not appearing very helpful. But as I said earlier, I'm finding it difficult to visualise the solution without actually seeing what your data looks like. Perhaps some more experienced monk will chip in and offer their advice :)

        Cheers,
        Darren :)

Re: Direct to spreadsheet
by GrandFather (Saint) on Jan 23, 2006 at 10:48 UTC

    A pretty light weight way of doing it is to replace your print TXT lines with push @data, and your while (<TXT>) with for (@data).

    You need to add a my @data; line and remove all the TXT related stuff.


    DWIM is Perl's answer to Gödel
      I tried this but must be doing something wrong. I think the problem surrounds this statement.
      for (@data2) { chomp; if( m[/\*] ) { $_ .= <TXT> until m[\*/]; s[ \s? / \* .+? \* / \s? ][]smgx; # Get rid of comments, we ar +en't interested in checking comment conten t chomp; }
      I'm not sure how to change this (the TXT bit) to reflect the fact I'm using an array instead ?

        Sorry, didn't notice that wrinkle the first time through. <TXT> needs to disappear - you are no longer using a file remember. I'd rework it to shift the lines out of the array bthus:

        while (@data2) { $_ = shift @data2; if( m[/\*] ) { $_ .= shift @data2 until m[\*/] or ! @data2; s[ \s? / \* .+? \* / \s? ][]smgx; } }

        Note that the chomps are not required because they weren't added to the lines in the array. Note also the check for end of data in the until.


        DWIM is Perl's answer to Gödel