Re: database design question

I manage this exact scenario for 20-30 partners. In my case the source data is essentially all the same -- they want our content sliced and diced in a way they've defined. They define what they want (which tables/columns of ours to query), what to call it (their tags), and what format they want it in (XML, delimited, CSV, fixed width, etc.). I built a tool set that does data transforms using an input/filter/output model. The inputs and outputs are primarily wrappers around existing CPAN modules: e.g., database tables using DBI; Excel using Spreadsheet::ParseExcel and Spreadsheet::WriteExcel; XML using a templating system we wrote around several existing Perl XML systems; etc. -- you get the idea.

The core code for each partner becomes a rather small data transform that "plugs in" to this using its own specification. Contrary to what some higher ups would like it to look like, it really is not as simple as defining a SQL view per partner. Each partner has their own filtering requirements that would be difficult to accomplish using just the database. It would be possible by writing PL/SQL packages (this is Oracle for the most part) but so much of what they want is much much simpler in Perl. That filtering becomes part of the transform. It's written in such a way that it's concise and readable and very easy to see what each partner gets. This will make no sense out of the context of the system, but here's an example of a fairly simple transform:

    my @transform =
    (
           DBQuery      => { DB_NAME       => $self->{DB_NAME},
                             SQL           => $sql,
                           },

        => RowAdd       => { VALUES        => $fixed_cols }

        => SiteURL      => { ID_COL        => 'ENTRY',
                             URL_COL       => 'URL',
                             SOURCE_COL    => 'X_ID',
                           }
        => XML          => { XML_TEMPLATE  => $html_template,
                             STRICT_XML    => 0,
                             NEWLINES      => 0,
                             DATA_MODE     => 1,
                             DATA_INDENT   => 3,
                             FILE_NAME     => $output->{PATH},
                           }
    );
[download]

Basically we abstract it out enough so we can use Perl itself as a fairly reasonable 4GL-ish like thing -- specifications are pretty much free of all those scary Perl things like regular expressions. 8-)

Once you've canned one of these up there's a common tool (both a common Perl package hierarchy and a common command line tool using those packages) that's used to "run" the tranforms. The actual data then ends up in its desired format in the directory structure. Another piece of the tool set (integrated into the package and command line tools) takes care of the desired delivery (ftp, http, ssh, sftp, etc.) again doing the bulk of the work with available Perl CPAN packages that we fit into a well-defined set of interfaces.

What I find coolest about this is that, unlike database oriented transform tools, I can transform virtually anything if I can fit it into the model. With CPAN, I can almost always find code that does the bulk of what I want. My job then becomes fitting it into the model. As you might expect, the model usually expands and becomes more robust as I accomodate more and more things. I've been building it for over 3 years. 8-) But back to the "doing anything": I've plugged in transforms that take data in then use LWP to poke that data into web site forms for example. Try doing that with traditional database transform tools.

Now all that being said, it's pretty significatnt to attempt to build in a short time. This was an evolutionary process. I consider what I've described here as the third generation of that evolution. Generation #1 was just quick and dirty scripting -- get it out. That's where I cut my teeth on things like DBI. So even though I threw that all away it was a very valuable tool for learning and helping me understand the problem domain. Generation #2 is still being used in transition. That was a much earlier model of what's here and much much simpler. Basically it was set up to address needs most or all of the early parnters had. What they wanted didn't differ much at all in a lot of areas. What I did there that led to what I have now is this:

I defined a base Perl class with methods for the basic things we did for each partner: create exports, send exports, get information about partners and their exports.
I framed that out so that the base class did 80-90% of what each partner needed, calling out to the methods it didn't have for the specifics; e.g., "give me the SQL", "filter a row", etc.
From the base class, I derived 3-4 main sub-classes for each type of output we needed at that time: Delimited, CSV, XML, etc.
For each partner, I then created a package that chose the output type base class to use as its parent. That gave it canned output of a given type and the bulk of the processing. That partner package then filled in the details.

That second generation wasn't as flexible. But I ended up building the third generation tool set because the second one made it so easy to add a new partner. I'd add a partner in an hour or two and people started noticing. They'd say, "Well if you do that so fast, how about plugging this data export/import job in for us." Those other requests weren't nearly as consistent as the original ones. So I started a new model on the side that became what I have now.

Long winded answer (and I haven't even had coffee yet) but maybe there's something you can take out of it and use for your case.

Comment on Re: database design question Download Code