mr.dunstan has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks, this is a vague question but should be a good hot topic - I have to do a ton of data migration/conversion in our database, consolidating and denormalizing a couple of sets of tables into a single set.

My cow-orkers are trying to get me to use SQL only to get the job done, but I think using perl (which I know better anyway) will afford me a lot of flexibility and precision, as well as allowing me to massage the data a little more. I'm pretty good with AoH's and even more mutated structures so I'm pretty confident I can get the job done with perl.

I am even wondering if using XML as an intermediate step when I transform the data would be a good thing. I am certain it will help me visualize the data before and after transformation, but is it necessary?

Uh, obviously this is somewhat new territory for me ... the main thing I'm hoping for is to get the job done quickly and accurately, and hopefully get a script out of this that I can use multiple times without having to code a lot of changes .... I'm hoping the wisdom of the monks may put me on the path to righteousness (or at least well formed data structures!)

mr.dunstan
humble novice
  • Comment on Data Migration Perl style vs. SQL style....

Replies are listed 'Best First'.
Re: Data Migration Perl style vs. SQL style....
by voyager (Friar) on Jun 21, 2001 at 22:56 UTC
    If the transformations are complicated (i.e., require a lot of procedural logic) and the datasets are relatively small, go ahead and use Perl.

    But if the transformations are straight forward and the datasets are large, use SQL. The SQL engine knows how do deal with large amounts of data, can do joins, sorts, etc. all very fast even on large amounts of data. I'd use SQL for all that I could and use Perl for any scripting or procedural logic.

    And if you have non-Perl folks involved, using SQL tables as staging areas allows the data to be viewed/verified with what ever tools you have, e.g. Crystal, or any iSQL tool.

Re: Data Migration Perl style vs. SQL style....
by VSarkiss (Monsignor) on Jun 22, 2001 at 00:22 UTC
    For converting large datasets, my usual rule is "Use SQL when you can, use Perl when you have to". Perl is terrific for coding transformations (more fun too), but there's just no substitute for doing work in the server, without moving the data out and back in.

    The last couple of times I've had to do things like this, I developed the transforms in SQL, but used Perl to control the execution. It's a good combination that allows you to exploit the strengths of both tools.

    Something else to consider: if this is new territory for you, use it as an opportunity to get good with SQL. Try to avoid cursors and row-by-row processing: use set-oriented operations as much as possible. Avoid "procedural SQL".

    Another tip: Back up everything before you start! ;-)

    HTH

      Thanks - that's great advice. I was afraid this would be too arbitrary or obvious a question but I got great advice on this one!

      mr.d
Re: Data Migration Perl style vs. SQL style....
by MrCromeDome (Deacon) on Jun 21, 2001 at 23:29 UTC
    I tend to agree with voyager and your coworkers on this one: if the job is as big as it sounds, use SQL to do the job. As soon as you start using tools external to the DBMS to start manipulating large sets of data, the speed at which the manipulation occurs is greatly decreased. While I find that DBI is pretty quick and flexible, it can never match the speed at which you can run SQL at the server console (well, at least not that I've seen!)

    It certainly is nice to use/want to use perl for the manipulation, but most DBMSes that I've seen these days have some pretty decent functions for manipulating and transforming data.

    Just my two bits. Hope it is some use.
    MrCromeDome

      Thanks to everyone.

      I am basically doing this in Oracle only, I am just a little afraid that in the worst case scenario I might have to do some synchronization between all the tables instead of a one-time migration... but I think the answer is to only use SQL (takes less time to code) and shoot for a single cutover, the way super confident programmers (not necessarily me) should!

      I am also chomping at the bit to use XML and Perl on something, but you guys have shown me the light. Thanks again.

      mr.dunstan
Re: Data Migration Perl style vs. SQL style....
by Abigail (Deacon) on Jun 22, 2001 at 02:49 UTC
    I agree with the previous posters. Having been a DBA in a previous life, I know it seldomly pays to try to outsmart a database server. People trying to do so either don't know much about SQL and their database, or do really know a lot about SQL, their database and the programming language they use. But if you belonged to the latter category, you wouldn't have asked here. ;-).

    Some reasons why the database is faster: 1) SQL isn't a general purpose language; its purpose is to deal with relational data. 2) The database knows the current state - what is in cache, what other queries are being performed, which indices are available, etc. An external program usually doesn't use this information to determine the optimal way of performing the task. 3) No network traffic. 4) Threading.

    -- Abigail

Re: Data Migration Perl style vs. SQL style....
by dimmesdale (Friar) on Jun 21, 2001 at 23:27 UTC
    Well, it was said straightforward before, but I'll offer a little of my advice. SQL isn't standarized across databases, so if by wanting use "multiple times" you mean with multiple databases, then Perl is one way to go. Though, to standardize SQL, check out OBDC (DBD::, or Win32::). It was said already, but its important enough to repeat: use Perl if there's going to be many logical steps involved with a small to moderate sized database; otherwise SQL is your friend--well actually, a combination of the two is useful, as well.

    Back to my point, DBD::OBDC and Win32::OBDC do a decent job of making SQL coding across databases seamless(something DBI didn't bother to even try). Though, it doesn't bar you from using the special SQL the database provides. SO, if you are going to be doing a lot of SQL and you want to be able to take it accroos databases, check out those modules.

Re: Data Migration Perl style vs. SQL style....
by Desdinova (Friar) on Jun 22, 2001 at 07:46 UTC
    If you are still intrested in the topic in general (using perl to manipulate data sets) Check out this book Data Munging with PERL It give a real good background on dealing with different types of data and has helped me alot in figuring things like this out.
      Funny, I bought that at the same time I posted this question! Thanks for responding ... I'm glad to know I'm
      • a) on the right track,
      • b) not insane,
      • c) insane but crazy like everyone else or at least
      • d) not alone!

      mr.d