in reply to ETL in Perl
I recently assisted on some ETL work using Microsoft’s toolsets ... and soon enough wound up piecing together a little bit of Perl code to work with the underlying (XML-based) definition files of that system. (I also wound up writing quite a bit of Visual Basic code ... (yuck!) ... to do a great many things that the pure-visual environment could not “quite” do.
As the process became bigger, it also became more complex, and it became increasingly difficult to sit down and feel like you actually understood what was going to happen when you mashed the Start button. I felt very uncomfortable with that. A graph is only comprehensible when it fits on a single uncluttered page.
Maybe I am just an old Luddite, but I really do embrace having source code as the basic way of defining to the computer what I want the computer to do. I know how to diff such files. I know how to work with them easily in version-control.
Having said that ... I, too, would like avoid having to write and to maintain “large amounts of code” by hand in any language; including Perl. I would look for (or build?) some system that allowed me to define the processes, the data relationships and so-on, and which would then just-in-time generate the necessary (Perl, of course...) source-code.
Dispatching the resulting definitions for parallel execution is a different problem, and a relatively easy one to handle “generically.” (Where I am parked right now, a rather archaic version of Tivoli Workload Scheduler is performing that task quite well. It smells bad but it works.) The advantage here is that any sort of workload can be dispatched in this way... “ETL” or otherwise. It is very limiting if “ETL works this-way but nothing else does,” or when the system that is doing ETL has no way to balance itself against other work that might be going on upon the same machine at the same time.
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: ETL in Perl
by runrig (Abbot) on Sep 16, 2010 at 20:24 UTC |