For the purpose of the discussion, let's just say that $db is an initialised DBI object and $ss is a statement thingy. I might select some information with
select foo, bar, baz, rat from sometable
I then have to insert that info across a number of different tables. The actual SQL might look like
insert into t1 (this, that) values (foo, bar)
insert into t2 (him, her, it) values (foo, baz, rat)
insert into t3 (ping, pong, thing) values (foo, rat, 17)
Where foo, bar et al are of course the values returned from the select. The code to do the above looks like this
while( my $d = $ss->fetchrow_hashref() ) {
insert 't1', {
this => $d->{foo},
that => $d->{bar},
};
insert 't2', {
him => $d->{foo},
her => $d->{baz},
it => $d->{rat},
};
insert 't3', {
ping => $d->{foo},
pong => $d->{rat},
thing => 17,
};
}
That's about as far as I can strip the problem domain back down to the essentials. But wait, there's more. After I had written about 12 scripts on this principle, two problems arose. Firstly, the thing was just too damned slow, because I was using $db->do(). I was building up a string that contained the insert statement in longhand, with something like
my $sql = "insert into $table("
. join(',' => sort keys %$data)
. ')values('
. join( ',' => @{$data{sort keys %$data}} )
. ')';
or something along those lines (the original code has since be thrown away), using a hash slice, and sorting the keys to make sure the values lined up correctly, which, upon reflexion maybe overly paranoid, but hey, it works. The do had to go, I had to start thinking about using placeholders and prepare and execute. And Devel::DProf showed me that the two sorts per insert added a non-trivial amount of execution time as well.
So the thing was to do was to check the table name and the first time it wa seen, to generate an insert statement with placeholders and stash that away in a hash.
insert into t1 (this, that) values (?, ?)
insert into t2 (him, her, it) values (?, ?, ?)
insert into t3 (ping, pong, thing) values (?, ?, ?)
And since I had a hash hanging around, I figured I could also stash away a reference to an array of the sorted SQL column names away, and do away with several million C<sort>s. I also ran into the problem that your standard DBD::Sybase doesn't allow multiple $sss per $db, so I created one DBI object per table being inserted into.
This worked well for a while: total execution time fell down to a couple of hours. But then, after a while I had to do something like
insert 't2', {
him => $d->{foo},
her => $d->{baz},
it => $d->{rat},
};
insert 't2', {
him => $d->{foo},
her => 5,
};
which meant that my carefully prepared insert statement blew up on the second call. So I fixed that with an ugly hack to distinguish the different ways of inserting into a table, on the few occasions it was required
insert 't2', {
him => $d->{foo},
her => $d->{baz},
it => $d->{rat},
}, 'want-it';
insert 't2', {
him => $d->{foo},
her => 5,
}, 'no-it';
So with that, I can finally show the snippet of code which is the point of this writeup, and that's the insert subroutine
I think the AUTOCOMMIT and COMMIT_BATCH constants are pretty easy to figure out.