Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Microsoft Analysis Server - Data Cubes

by richardX (Pilgrim)
on Dec 01, 2002 at 20:04 UTC ( [id://216799]=note: print w/replies, xml ) Need Help??


in reply to Microsoft Analysis Server - Data Cubes

I have built a system similar to the one that is being discussed. The base system that drove the entire application was the Data Warehouse (DW). Creating a DW properly, with efficient reporting in mind, is a huge thread in itself.

The next major piece of the puzzle was the Extraction Translate and Load application (ETL). We wrote our own ETL tool because we needed a data driven application that non-technical people could use to map data from external applications to the DW. There are many ETL tools on the market if you need a commercial one.

The final piece of the puzzle is the reporting. We pre-built OLAP cubes that would answer the most common questions that users would ask of the DW including the drill downs. It would be a huge challenge to create dynamic OLAP cubes that could be constructed on the fly of any fields that the user desires. Their are so many requirements that you have to concern yourself with when building an OLAP cube, including what your facts and dimensions are, the size of the tables that you are extracting from, are the fields indexed, and many other concerns. We had to manage the expectations of our users, to let them know that we could build addition cubes but that allowing them to create anything they wanted was not acceptable. Runaway processes on the database server were the biggest concern. The end result was that we did not allow users to create cubes, only execute them.

The users accessed the cubes from a web page that asked for date ranges and other parameters. The application then ran the cube, and exported the results to Excel, HTML report, or email. We provided each user with training on how to manipulate the Excel pivot tables and graphs once they had the data they needed. Power users then took the Excel data and created tons of their own reports, graphs, and analysis. So it was impossible to anticipate all the requirements for Power users, so we let them run with the extracted data without further supervision.

For more mainstream users, we created a Digital Dashboard using Microsoft’s Digital Dashboard toolkit. Users could then access the cubes from the Dashboard after logging in and authenticating themselves. We used Role Based Access Control (RBAC) to control access to reports, with each role giving access to specific cubes. Our analysis also identified the most common reports that multiple users and or departments would use on a daily and weekly basis. On a scheduled basis, those cubes were run at night and the reports exported to HTML and stored into folders for canned reports. Some RBAC users only could see these canned reports and never access cubes directly. This prevented the servers from being overwhelmed by cube processing that was redundant.

We purposely did not constrain which programming tools we used, to give us the maximum flexibility. We used Perl, batch files, Excel macros, Digital Dashboard calls, HTML, SQL, stored procedures, multidimensional expressions (MDX) scripts, and any applicable Perl modules like DBD/DBI etc. Most MS programmers do not know about the hidden power of MDX. I would recommend learning every thing that you can about MDX, since it talks to the services directly and most efficiently. Maybe you could write a Perl module that controls MDX scripts :}

Richard

There are three types of people in this world, those that can count and those that cannot. Anon

  • Comment on Re: Microsoft Analysis Server - Data Cubes

Replies are listed 'Best First'.
Re: Re: Microsoft Analysis Server - Data Cubes
by Anonymous Monk on Dec 01, 2002 at 23:44 UTC
    OLAP is nothing but an SQL aggregate with a big "GROUP BY" that you then display in a matrix of columns and rows. I prefer having the option to query the live data. If you want to boost performance by distilling the data, do so as a nightly batch job that stores the aggregate query results into a heavily indexed "temp" table that avoids the bulk of the joins and/or aggregation. That's just a preference though.

    What follows is some proof of concept stuff I did. I pulled it from the larger project, so you will need to fill in some bits and pieces like setting up $dbh via DBI, HTML head/body codes, and spitting out HTML headers.

    It was written to use PostgreSQL, but the SQL is generic enough.

    Change @rowhead, @colheads, @datacols, $cubetable to match your system.

    NOTE: There is much room for improvement, but the basic concept is there. Much convolution is due to my need to generate the HTML table in proper order. In retrospect there is probably a much more sane way to code it. . .

    use strict; use DBI; my $dbh; #SET THIS TO A VALID DBI CONNECTION HANDLE. my @rowheads = qw/fiscal_year sale_in_territory/; my @colheads = qw/class product_id/; my @datacols = qw/units_sold/; my $cubetable = 'transactions'; my @pivots = @rowheads; push @pivots, @colheads; my (%values,%names,%ordinals,%filtered,%totals,%columnheaders); my $sth = $dbh->prepare('select '.join(',',@rowheads).','.join(',',@colheads).','.join(',',@datacols). +' from '.$cubetable.' ORDER BY '.join(',',@pivots)) || die $dbh->errstr; $sth->execute || die $dbh->errstr; my @row; while (@row = $sth->fetchrow_array) { #put the data values in a safe place my @data; for (my $i = 0; $i < @datacols; $i++) { my $d = pop @row; push @data,$d; } my $continue = 0; foreach (@data) { $continue = 1 if defined $_; } next unless $continue; my @key = @{GetKey(@row)}; my $key = join(' ',@key); if (defined $values{$key}) { my $count = -1; foreach (@{$values{$key}}) { $_ = $_ + $data[$count++]; } } else { $values{$key} = \@data; } #now set up the column header hash splice(@key,0,(@rowheads)); $key = join(' ',@key); if (defined $columnheaders{$key}) { my $count = -1; foreach(@{$columnheaders{$key}}) { $_ = $_ + $data[$count++]; } } else { $columnheaders{$key} = \@data; } } #build the column headers in a 2D array #my $colcount = 1; #foreach (@colheads) { # $colcount = $colcount * (@{$names{$_}}); #} #$colcount = $colcount * (@datacols); my $stop; my @colkey; my $count = 0; foreach (@colheads) { $colkey[$count] = 0; $stop = $stop.$#{$names{$colheads[$count]}}.' '; $count++; } chop $stop; my @cols; my $count = 0; while ($stop ne join(' ',@colkey)) { CalcCol($#colheads); my $key = join(' ',@colkey); # next if !defined $columnheaders{$key}; $cols[$count] = $key; $count++; } #print '<pre>',main::Dumper(@cols),'</pre>'; #display print '<table border = 1>'; my $datacount; #print the column headers my $count = 0; my @colkeys; foreach my $h (@colheads) { $datacount = 0; print '<tr>'; print '<td colspan="'.(@rowheads-1).'"></td>'; print "<td>$h</td>"; foreach my $key (@cols) { next if !defined $columnheaders{$key}; $datacount++; my @tmp = split(' ', $key); foreach (@datacols) { print '<td>'.${$names{$h}}[$tmp[$count]].'</td>'; } } print '</tr>'; $count++; } #print the row heads & data col heads print '<tr>'; print '<td>',join('</td><td>',@rowheads),'</td>'; for (my $i = 0; $i < $datacount;$i++) { foreach my $h (@datacols) { print "<td>$h</td>"; } } print '</tr>'; #print the row headers and data; my @rowheader; my @rowkey; my $rowcount; my $count = 0; my $stop = ''; foreach (@rowheads) { $rowkey[$count] = 0; $stop = $stop.$#{$names{$rowheads[$count]}}.' '; $count++; } chop $stop; my $first = 1; while (join(' ',@rowkey) ne $stop) { CalcRow($#rowheads); my $rowOK = 0; my $count = 0; foreach my $h (@rowheads) { $rowheader[$count] = ${$names{$h}}[$rowkey[$count]]; $count++; } my $rowstr = '<tr><td>'.join('</td><td>',@rowheader).'</td>'; foreach my $k (@cols) { next if !defined $columnheaders{$k}; my $data = $values{join(' ',@rowkey).' '.$k}; my @data; if (defined($data)) { @data = @{$data}; $rowOK = 1; } else { foreach (@datacols) { push @data, undef; } } $rowstr = $rowstr.'<td>'.join('</td><td>',@data).'</td>'; } if ($rowOK) { print $rowstr,'</tr>'; } } sub CalcRow { my $pos = shift; if ($pos == 0) { $rowkey[0]++ if $rowkey[0] <= $#{$names{$rowheads[0]}}; return 0; } else { if ($rowkey[$pos] >= $#{$names{$rowheads[$pos]}}) { #PRINT TOTAL LINE HERE $rowkey[$pos] = 0; CalcRow($pos-1); } else { $rowkey[$pos]++; return 0; } } return 1; } sub CalcCol { my $pos = shift; if ($pos == 0) { $colkey[0]++ if $colkey[0] <= $#{$names{$colheads[0]}}; return 0; } else { if ($colkey[$pos] >= $#{$names{$colheads[$pos]}}) { #PRINT TOTAL LINE HERE $colkey[$pos] = 0; CalcCol($pos-1); } else { $colkey[$pos]++; return 0; } } return 1; } print '</table>'; #print '<p>columnheaders:<br><pre>',main::Dumper(%columnheaders),'</pr +e><p>'; #print '<p>stop: ',main::Dumper($stop); #print '<p>stop: ',main::Dumper(join(' ',@rowkey)); #print '<p>NAMES: ',main::Dumper(%names),'<p>'; #print 'VALUES: ',main::Dumper(%values),'<p>'; #print 'ORDINALS: ',main::Dumper(%ordinals),'<p>'; sub GetKey { my $count = 0; my @key; foreach my $val (@_) { if (defined $ordinals{$pivots[$count].$val}) { push @key, $ordinals{$pivots[$count].$val}; } else { my $o = push @{$names{$pivots[$count]}},$val; $o--; $ordinals{$pivots[$count].$val} = $o; push @key, $o; } $count++; } return \@key; } 1;
    Questions, comments to rob AHT cabrion DHOT com.

    I really would like to know if someone decides to hack on this, or builds a module from it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://216799]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-04-18 12:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found