Wow! I have joked about this kind of thing, but never thought anyone would actually consider it.
Where I work, we use Informatica for our ETL. It works quite nicely, and fairly efficiently.
A year or so ago, my team decided that the current set of marts that had evolved were not truly a data warehouse, and that it was time to design a true data warehouse that could more efficently produce the sorts of marts that more people in our company would find useful.
It was at *that* time that the question of doom was laid upon us: "What tools do you need to make your project successful?"
We set out on a quest: to find the tools that would make us successful. With a consultant in tow to help us, we came up with some criteria.
* Parallel Execution
* compatibility with external scheduler (we use CA7)
* Changed Data Capture (with DB2)
* integrated Data Profiler
* integrated Metadata gathering and Repository
We weighted the criteria and evaluated several components. We arrived at the decision that Ab Initio was the best tool in most categories.
We put our recommendations together in a presentation and went to management, who promptly said "Thats too expensive".
It was at that point that I joked (out of earshot of the VP of course) that we should write our own ETL with Perl.
Let me explain the criteria that we used, in hopes of helping you avoid some of our problems.
First, and most importantly for us, was the matter of Metadata. The existing set of marts were developed for a small subset of the company, which was originally a seperate company purchased by a much larger entity. They worked fine for the smaller company, but had serious problems when the information was to be presented to other parts of the larger company.
Metadata was proposed as a stepping stone from effectiveness for the subset, to effectiveness for the superset. If we could document the business rules used to collect and present the data in the Marts, we would then have the information we needed to either convert/ explain the data in the mart for general consumption, or have the ammunition needed to justify re-development of the mart for a wider audience.
Without a metadata plan in place, we spent a lot of time researching what had been done to the data on its way to the mart so we could explain why it didn't look the way the business partners expected.
We liked Ab Initio's Metadata plan a bit better than Informatica's because the metadata model was more accessible and extensible, and the level of detail was greater.
Ab Initio's parallel execution is integrated better into the tool than Informatica's. Informatica has parallel execution at the server level, that is, if you want parallelism, you run it on more than 1 server. Whereas Ab Initio has parallelism down into the maps/ graphs/ programs themselves.
Both tools offer Changed Data Capture, which for us is essential because there are applications on the source system that allow changes to the data without requiring the change timestamp to be updated, as well as users with direct access to the data making un-audited changes. Both tools interface with the DB2 standard log interface to search for change entries on the requested tables.
Finally, the last 2 criteria, schedulable via external tools, and Profiler integration, were fairly equal. Ab Initio generates Kornshell scripts that can be scheduled by just about anything. Informatica has similar hooks, but not as accessible as kornshell. Both tools have data profiling tools available also; but Ab Initio's seem to run better.
I guess my main point in all of this is, ETL itself is not the biggest issue one faces when doing data warehousing, and just about any tool can move data effectively from one platform to another. Metadata, when addressed from the outset, can impact your success much more, and is directly affected by your choice of ETL tools.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.