Batch DB query -- implementation?

zaimoni has asked for the wisdom of the Perl Monks concerning the following question:

Target systems:

Win95/remote MS SQL Server
WinNT/remote MS SQL Server
FreeBSD/local MySQL Server

For performance reasons, it would be useful to stuff multiple queries into a single communication to a remote MS SQL server through Perl. Something like:

INSERT INTO UberTable (Arg1,...ArgN) VALUES (Val1,...,ValN)
INSERT INTO UberTable (Arg1,...ArgN) VALUES (Val1,...,ValN)
INSERT INTO UberTable (Arg1,...ArgN) VALUES (Val1,...,ValN)
INSERT INTO UberTable (Arg1,...ArgN) VALUES (Val1,...,ValN)
[download]

Of course, each of the above lines would be delimited correctly for the intended interface.

Again: Sending n queries separately is causing a very inconvenient performance hit that would be removed by sending n queries in one fell swoop.

I don't know what module system will have the intended effect when used correctly. My current code base uses the DBI and DB::ODBC modules. Alas:

(DBI.pm inline documentation, slightly reformatted)

Multiple SQL statements may not be combined in a single statement handle ($sth), although some databases and drivers do support this (notably Sybase and SQL Server).

(end DBI.pm inline documentation)

If a more effective module system exists, name all that you immediately know of. Source code fragments are unnecessary, unless it's a creative application of the DBI module system.

Win__: I would prefer existence on the ActivePerl port, but could rebuild Perl under MingW32 if necessary.
FreeBSD: must be for the default installed version.

Thank you for your time.

Comment on Batch DB query -- implementation? Download Code

Replies are listed 'Best First'.
Re: Batch DB query -- implementation? by cchampion (Curate) on Jul 03, 2002 at 15:54 UTC
There is no workaround (that I know of) for the DBI limitation, but be aware that MySQL supports the extended INSERT format. `INSERT INTO UberTable (Arg1,...ArgN) VALUES (Val1,...,ValN), (Val1,...,ValN), (Val1,...,ValN), (Val1,...,ValN);` [download] In addition to sending more insert statements at once, it is considered ONE statement only. It means that the query will be parsed only once and execute much faster than sending N separate statements. The only drawback is that you should be careful not to excess the maximum packet length that is allowed for a query (it depends on your server set-up. Check it by issuing a "SHOW VARIABLES" query) HTH cchampion	[reply] [d/l]
Re: Batch DB query -- implementation? by Aristotle (Chancellor) on Jul 03, 2002 at 15:23 UTC
You have answered your own question: although some databases and drivers do support this (notably Sybase and SQL Server). From the documentation for DBD::Sybase: DBD::Sybase has the ability to handle multi-statement SQL commands in a single batch. For example, you could insert several rows in a single batch like this: `$sth = $dbh->prepare(" insert foo(one, two, three) values(1, 2, 3) insert foo(one, two, three) values(4, 5, 6) insert foo(one, two, three) values(10, 11, 12) insert foo(one, two, three) values(11, 12, 13) "); $sth->execute;` [download] `____________` Makeshifts last the longest.	[reply] [d/l]
Re: Re: Batch DB query -- implementation? by zaimoni (Beadle) on Jul 03, 2002 at 16:13 UTC
DBD::Sybase isn't even installed :) Within the DBI, the target modules that I immediately need verified are DBD::ODBC (the gateway I'm currently using to MS SQL Server) and DBD::MySQL. I am biased negative on DBD::MySQL, since I have already experimented with the following: `'DELETE FROM UberTable WHERE WonderVar=?; OPTIMIZE UberTable'` [download] and variants, which crash when sent through the DBI.	[reply] [d/l]
Re^3: Batch DB query -- implementation? by Aristotle (Chancellor) on Jul 03, 2002 at 18:55 UTC
I don't know if you made the mistake because it's just off the top of your head, but the syntax is `OPTIMIZE TABLE tablename`. Note you can send it in a seperate query anyway, So far, nothing DBI-related here.. I think what you have to understand is that DBI intentionally keeps a low feature profile and avoids trying to be intelligent. It caters mostly to the lowest common denominator, trying to make sure that itself is, to the extend possible, entirely DBMS-independent. However, the programmers of drivers are encouraged to implement interfaces to DMBS-dependent features in their module for those who wish to take advantage of unportable features. So the place to find your answers is the driver's POD. DBD::ODBC mentions "multiple result sets" a fair number of times, so I'd say it's safe to assume it supports what you want. The changelog says the distribution comes with tests for multiple result sets, so fetch the tarball from CPAN and look at the included tests (the `.t` scripts in the `t/` directory) to for example source. Of course, as everyone else has told you, you may well find yourself looking at a negligible performance increment.. Makeshifts last the longest.	[reply]
Re: Batch DB query -- implementation? by perrin (Chancellor) on Jul 03, 2002 at 15:48 UTC
I seriously doubt that the network communication is slowing you down much here. I've inserted hundreds of thousands of rows in seconds with DBI. You might do better to tune your database instead of worrying about this.	[reply]
Re: Re: Batch DB query -- implementation? by zaimoni (Beadle) on Jul 03, 2002 at 15:54 UTC
The network connection for the MS SQL Server is IP. Ping time from my location is between 160 and 220 ms.	[reply]
Re: Re: Re: Batch DB query -- implementation? by perrin (Chancellor) on Jul 03, 2002 at 15:59 UTC
Are you using bind variables? I believe that will reduce the amount of data sent over the network.	[reply]
Re: Re: Re: Re: Batch DB query -- implementation? by zaimoni (Beadle) on Jul 03, 2002 at 16:07 UTC
Re(4): Batch DB query -- implementation? by perrin (Chancellor) on Jul 03, 2002 at 16:28 UTC
Re: Batch DB query -- implementation? by mpeppler (Vicar) on Jul 03, 2002 at 18:11 UTC
Like perrin I very much doubt that batching your requests is really going to improve things for the MS-SQL side of things. You are using placeholders, which with MS-SQL means that only the data for each parameter is actually sent for each execute. The number of replies from the database server will always be the same whether you send a bunch of inserts as a single buffer or not as at the protocol level you'll get one reply for each inserted row. For MySql the situation may be a little bit different as I believe that DBD::MySQL emulates placeholders (that is it does the equivalent of interpolating the values in the SQL string before sending it to the server for each row). But I don't really know MySQL - so I don't know if there is a way around that. Finally - when you say that DBD::MySQL dies on multiple statements separated by ';' then that is perfectly normal. What you have to realize is that SQL clients that use the ; character to separate SQL commands in reality send each command to the server separately, essentially doing something like `@cmds = split(/;/, $sql); # yes I know - that regexp won't do the ri +ght thing! foreach (@cmds) { $sth = $dbh->prepare($_); $sth->execute; etc... }` [download] Hopefully this little explanation will help :-) Michael	[reply] [d/l]
Re: Batch DB query -- implementation? by rdfield (Priest) on Jul 03, 2002 at 15:44 UTC
Not sure what you have in the way of bind variable types in SQL Server, but in Oracle the standard way to to that (ie massive batch load) is to create a stored procedure that uses a 'forall' construct to bypass the SQL layer for a bound array of input values, and call that stored procedure with an array of bind variables. rdfield msg me for an example if req'd	[reply]