Using DBI to transfer data from one database to another

Category:	DBI
Author/Contact Info	grinder
Description:	Fellow Monks, in the recent weeks I have been working with DBI.pm to deal with moving data from one database on one machine to another running on another machine. With DBI this was actually a lot of fun. The code has evolved quite a bit over time, as I sought to strip the problem back to its essence, and make it as clear as possible to see what is happening. Basically what's going on is that a `select` is performed, and due to the radically different database designs, multiple `insert`s are required to transfer the information over. So what I had to do was to create an elegant way of dealing with insertions. For the purpose of the discussion, let's just say that `$db` is an initialised `DBI` object and `$ss` is a statement thingy. I might `select` some information with `select foo, bar, baz, rat from sometable` I then have to `insert` that info across a number of different tables. The actual SQL might look like `insert into t1 (this, that) values (foo, bar) insert into t2 (him, her, it) values (foo, baz, rat) insert into t3 (ping, pong, thing) values (foo, rat, 17)` Where foo, bar et al are of course the values returned from the select. The code to do the above looks like this `while( my $d = $ss->fetchrow_hashref() ) { insert 't1', { this => $d->{foo}, that => $d->{bar}, }; insert 't2', { him => $d->{foo}, her => $d->{baz}, it => $d->{rat}, }; insert 't3', { ping => $d->{foo}, pong => $d->{rat}, thing => 17, }; }` [download] That's about as far as I can strip the problem domain back down to the essentials. But wait, there's more. After I had written about 12 scripts on this principle, two problems arose. Firstly, the thing was just too damned slow, because I was using `$db->do()`. I was building up a string that contained the insert statement in longhand, with something like `my $sql = "insert into $table(" . join(',' => sort keys %$data) . ')values(' . join( ',' => @{$data{sort keys %$data}} ) . ')';` [download] or something along those lines (the original code has since be thrown away), using a hash slice, and sorting the keys to make sure the values lined up correctly, which, upon reflexion maybe overly paranoid, but hey, it works. The `do` had to go, I had to start thinking about using placeholders and `prepare` and `execute`. And Devel::DProf showed me that the two sorts per insert added a non-trivial amount of execution time as well. So the thing was to do was to check the table name and the first time it wa seen, to generate an insert statement with placeholders and stash that away in a hash. `insert into t1 (this, that) values (?, ?) insert into t2 (him, her, it) values (?, ?, ?) insert into t3 (ping, pong, thing) values (?, ?, ?)` And since I had a hash hanging around, I figured I could also stash away a reference to an array of the sorted SQL column names away, and do away with several million C<sort>s. I also ran into the problem that your standard DBD::Sybase doesn't allow multiple `$ss`s per `$db`, so I created one DBI object per table being inserted into. This worked well for a while: total execution time fell down to a couple of hours. But then, after a while I had to do something like `insert 't2', { him => $d->{foo}, her => $d->{baz}, it => $d->{rat}, }; insert 't2', { him => $d->{foo}, her => 5, };` [download] which meant that my carefully prepared insert statement blew up on the second call. So I fixed that with an ugly hack to distinguish the different ways of inserting into a table, on the few occasions it was required `insert 't2', { him => $d->{foo}, her => $d->{baz}, it => $d->{rat}, }, 'want-it'; insert 't2', { him => $d->{foo}, her => 5, }, 'no-it';` [download] So with that, I can finally show the snippet of code which is the point of this writeup, and that's the insert subroutine I think the AUTOCOMMIT and COMMIT_BATCH constants are pretty easy to figure out.
sub insert { my( $table, $data, $key ) = @_; my $t = $key ? "$table-$key" : $table; if( !defined $dbo->{$t} ) { # first time through, set some things up my $db = DBI->connect(DB_IN_SRV, DB_IN_DB, DB_IN_PW, {AutoCommit => AUTOCOMMIT, PrintError => 0}) or die "Couldn't connect to database @{[DB_IN_SRV]} " . DBI->errstr; my $s = [sort keys %$data]; my $sql = "insert into $table(" . join(',' => @$s) . ')values(' . join( ',' => ('?') x @$s ) . ')'; my $ss = $db->prepare($sql) or die "[$table/$t] prepare $sql\n" . $db->errstr . "\n"; @{$dbo->{$t}}{qw/DB STH SQL SORT COUNT ERR/ } = ($db, $sth, $sql, $s, 0, 0); } ++$dbo->{$t}{COUNT}; if( !$dbo->{$t}{STH}->execute( @$data{ @{$dbo->{$t}{SORT}} } ) ) { warn $dbo->{$t}{COUNT}, ': ', $dbo->{$t}{SQL}, "\n", join( "\n", @$data{ @{$dbo->{$t}{SORT}} } ), "\n", $dbo->{$t}{DB}->errstr, "\n\n"; ++$dbo->{$t}{ERR}; # give up quickly if things are going worng if( $dbo->{$t}{COUNT} > 10 and $dbo->{$t}{ERR} > 0.6 * $dbo->{$t}{COUNT} ) { die "Giving up.\n"; } } AUTOCOMMIT or $dbo->{$t}{COUNT} % COMMIT_BATCH or $dbo->{$t}{DB}->commit; }

Comment on Using DBI to transfer data from one database to another Download Code


good chemistry is complicated, and a little bit messy -LW
	PerlMonks