mod_perl was hosed by a bind variable

phildog has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: mod_perl was hosed by a bind variable (1 broken) by tye (Sage) on Jan 30, 2004 at 02:48 UTC
One does `where mailbox = 'test'` and one does `where mailbox = test`. I'd suspect the latter is so much faster because it is a syntax error (or perhaps just doesn't match any records). - tye	[reply] [d/l] [select]
Re: mod_perl was hosed by a bind variable by perrin (Chancellor) on Jan 29, 2004 at 22:21 UTC
You must have been preparing the query every time then. The fastest way to use DBI is with prepare_cached() and bind variables. This is in the mod_perl docs.	[reply]
Re: Re: mod_perl was hosed by a bind variable by dws (Chancellor) on Jan 29, 2004 at 22:40 UTC
The fastest way to use DBI is with prepare_cached() and bind variables. This is in the mod_perl docs. Is this true even when the underlying database (e.g., MySQL, in this case), doesn't support prepared queries?	[reply]
Re: Re: Re: mod_perl was hosed by a bind variable by perrin (Chancellor) on Jan 29, 2004 at 22:53 UTC
Yes, because it skips a bunch of object creation and stuff on the client side.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re^2: mod_perl was hosed by a bind variable (tangent) by tye (Sage) on Jan 30, 2004 at 16:24 UTC
Since perrin keeps pressing this point, I wanted to mention that, though perrin's advice is reasonable, this first conclusion ("You must have been preparing the query every time then") doesn't make any sense in the situation being discussed. If someone complains that using a placeholder makes their code much slower, then caching the query cannot be part of the explanation, since the "fast" case makes it impossible to cache the query so caching could only explain it if caching made things slower. - tye	[reply]
Re: Re^2: mod_perl was hosed by a bind variable (tangent) by perrin (Chancellor) on Jan 30, 2004 at 20:53 UTC
If someone complains that using a placeholder makes their code much slower, then caching the query cannot be part of the explanation That was my point, actually: that if not using placeholders appeared to make it faster, then he must not have been caching the query, which means he was missing out on a large part of the speed offered by placeholders.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: mod_perl was hosed by a bind variable by stvn (Monsignor) on Jan 29, 2004 at 23:05 UTC
Something in the back of my head is saying that DBD::MySQL does not truely support SQL placeholders, and instead emulates them. So it doesn't surprise me that you saw better performance by not using them. After tooling around the net for a little while, I didn't find anything more conclusive on DBD::MySQL placeholders being emulated or not. I know the older versions of DBD::MySQL didn't support it. Here is a link to that; Tim Bunce talking about DBD::MySQL, look in the PARAMETER BINDING section for this ... Neither engine supports placeholders, but the DBD::mysql and DBD::mSQL drivers provide full emulation. Question marks are used as placeholders, as in ... I also found this more recent link on the DBI Dev message board. They are talking about named placeholders, but in it this person mentions that ... DBD::mysql does the placeholder parsing in the DBD driver so you can substitue a placeholder for anything, including the whole of the SQL statement if you so desire (with a little bit of trickery (such as binding as an integer)). I would be very interested to know for sure who/where/what does the placeholder parsing with DBD::MySQL. And if it would really make a difference anyway. -stvn	[reply]
Re: Re: mod_perl was hosed by a bind variable by dws (Chancellor) on Jan 29, 2004 at 23:20 UTC
MySQL 4.0x doesn't support prepared queries, so they're emulated. Emulation means hanging on to the query, and expanding the '?' markers at `execute` time, substituting appropriately quoted values. However, this isn't a heavyweight operation, particularly not when compared with the overhead of shipping the query over the wire to the server, and waiting for the server to crank on it and return a result. Assuming (and we have to assume, since we haven't seen code yet, and what we have been shown doesn't verify that results are identical) that query that arrives at the database server process is identical in either case, and assuming that a standalone (non-mod_perl) test doesn't show the the same disparity in timing, then something underneath `bind_param` is interacting with mod_perl. On a light read of DBD.pm and DBI::MySQL.pm, I can't see what that would be. Curious.	[reply] [d/l] [select]
Re: mod_perl was hosed by a bind variable by The Mad Hatter (Priest) on Jan 29, 2004 at 22:00 UTC
The only problem with interpolating it into the SQL statement is that if it comes from an outside (that is, untrusted) source (as it seems in your examples), then anyone can maliciously or unintentionally include characters that mean things to SQL. For example, imagine someone putting something like this into $mailbox (presumably gotten from the CGI values): `"test"; delete from bar;`	[reply] [d/l]
Re: Re: mod_perl was hosed by a bind variable by stvn (Monsignor) on Jan 29, 2004 at 22:29 UTC
MadHatter, Actually not true, check out what the DBI docs say about such SQL injection attacks: (In the "General Interface Rules & Caveats" section on this page) Multiple SQL statements may not be combined in a single statement handle ($sth), although some databases and drivers do support this (notably Sybase and SQL Server). Although this is by no means a reason not to validate your strings before dumping them into a DB. -stvn	[reply]
Re: Re: Re: mod_perl was hosed by a bind variable by tilly (Archbishop) on Jan 30, 2004 at 01:39 UTC
You have to work harder (for instance put the nasty stuff in a correlated subquery), but MadHatter's point is still fundamentally true. An SQL injection attack can be used to truly devastating effect through well-known techniques that people can easily Google for if they want.	[reply]
Re: mod_perl was hosed by a bind variable by cchampion (Curate) on Jan 29, 2004 at 23:37 UTC
Why don't you show some more code? I suspect that the reason for your magic efficiency gain could lie in MySQL query cache. If you want to perform a meaningful comparison, you should turn it off in both cases. If you don't, you get a long time for your first run and a very short time for each run after that. I might be wrong, but try to run your tests in a reverse order, i.e. first the query with a placeholder and then the one without.	[reply]
Re: mod_perl was hosed by a bind variable by chromatic (Archbishop) on Jan 29, 2004 at 22:11 UTC
My BOFH calendar blames sunspots. Can you post more code?	[reply]
Re: mod_perl was hosed by a bind variable by dws (Chancellor) on Jan 29, 2004 at 22:27 UTC
Do you see as dramatic a different if you switch to using the `$query->execute($mailbox);` [download] form? (A light read of DBI.pm suggests that it shouldn't, but it might be an interesting experiment.)	[reply] [d/l]
Re: Re: mod_perl was hosed by a bind variable by phildog (Novice) on Jan 29, 2004 at 22:37 UTC
Wow, what a coincidence! While waiting for some responses I was leafing through an old copy of "Writing CGI Applications with Perl" and saw that is how they did it: `$query->execute($mailbox);` So I tested that before I even saw your question :-) And no, it wasn't any faster than using the bind_param call...	[reply] [d/l]
Re: mod_perl was hosed by a bind variable by fireartist (Chaplain) on Jan 30, 2004 at 11:58 UTC
Here's some benchmark code which shows there's not much difference at all using bind values. I suspect tye's explanation above is correct. Results `> ./test.pl Benchmark: timing 50000 iterations of bind, inline... bind: 53 wallclock secs (23.00 usr + 8.93 sys = 31.93 CPU) inline: 52 wallclock secs (21.68 usr + 8.98 sys = 30.66 CPU)` [download] Code #!/usr/bin/perl -w use strict; use DBI; use Benchmark qw/timethese/; use Config::File; my $c = Config::File->new( file => 'cppsu.config' ); my $h = DBI->connect( $c->db_datasource, $c->db_user, $c->db_pass, { $c->db_args } ); timethese( 50000, { bind => 'xbind', inline => 'xinline', }); sub xbind { my $sql = 'SELECT edited FROM faq WHERE id = ?'; my $sth = $h->prepare( $sql ); $sth->execute( 1 ); my @res = $sth->fetchrow_array; die unless @res == 1; } sub xinline { my $sql = 'SELECT edited FROM faq WHERE id = 1'; my $sth = $h->prepare( $sql ); $sth->execute; my @res = $sth->fetchrow_array; die unless @res == 1; } [download]	[reply] [d/l] [select]
Re: Re: mod_perl was hosed by a bind variable by perrin (Chancellor) on Jan 30, 2004 at 14:56 UTC
If you switch to using prepare_cached for the xbind version, it should make a difference. It would not be possible to use prepare_cached for the xinline version in most cases, since that would results in a new query being cached and held in memory for every possible value of id. Also, you should actually do the variable substitution in xinline to make this realistsic, i.e. "id = $id".	[reply]
Re: Re: Re: mod_perl was hosed by a bind variable by fireartist (Chaplain) on Jan 30, 2004 at 16:48 UTC
I know that prepared_cache would make a big difference, but I wasn't interested in testing that, just the queries that were shown in the root node. Thanks for pointing it out though.	[reply]
Re: Re: mod_perl was hosed by a bind variable by phildog (Novice) on Jan 30, 2004 at 15:52 UTC
Your benchmark is interesting, but not very relevant... Note the subject: "mod_perl was hosed by a bind variable" I'm not claiming bind variables alone make any difference in query speed, I'm claiming that bind variables in conjunction with mod_perl (under versions noted above) make a HUGE difference. So why don't you try again? Note: your test will probably need to split into two separate files to measure correctly as run under mod_perl.	[reply]
Re: Re: Re: mod_perl was hosed by a bind variable by fireartist (Chaplain) on Jan 30, 2004 at 18:31 UTC
Unfortunately I don't have access to a mod_perl server, however it's reasonable to assume initially that it's not a mod_perl specific problem, unless you also ran benchmarks on a non-mod_perl server; remove layers of complexity to isolate the problem. I have run some more benchmarks, first I changed the WHERE clause to use a text column and tried it using bind and using non-quoted inline, such as tye suggested was a syntax error. This actually resulted in using bind being faster. Results `> ./test.pl Benchmark: timing 50000 iterations of bind, inline... bind: 55 wallclock secs (19.04 usr + 8.97 sys = 28.01 CPU) inline: 81 wallclock secs (20.21 usr + 11.32 sys = 31.53 CPU)` [download] Code `sub xbind { my $sql = 'SELECT edited FROM faq WHERE q = ?'; my $sth = $h->prepare( $sql ); $sth->execute( 'q' ); my $res = $sth->fetchall_arrayref; $xbind = @{$res}; } sub xinline { my $sql = 'SELECT edited FROM faq WHERE q = q'; my $sth = $h->prepare( $sql ); $sth->execute; my $res = $sth->fetchall_arrayref; $xinline = @{$res}; }` [download] So, I modified it to check how many rows were being returned, the bind version was returning 1, the inline version was returning all rows. My mysql version is 3.23.49, DBI version 1.30 - completely different generation of db than you, so tests are not equal. So if you got the same results with these tests, then you should try adding back in layers such as mod_perl, to see which is to cause.	[reply] [d/l] [select]
Re: mod_perl was hosed by a bind variable by Anonymous Monk on Jan 30, 2004 at 00:18 UTC
I am highly highly skeptical of your benchmark, and until you provide some benchmark code, i'm inclined no to believe your results.	[reply]
Re: mod_perl was hosed by a bind variable by dws (Chancellor) on Jan 30, 2004 at 16:29 UTC
phildog, how about an update on this? The smart money is betting with tye, who suggests that the speed difference is explainable entirely by the first case failing (silently) and the second case working (silently).	[reply]
Re: Re: mod_perl was hosed by a bind variable by phildog (Novice) on Jan 30, 2004 at 16:56 UTC
will do... just need a little time. later today or Mon at latest. sorry not to put whole sample up immediately, but always hesitant to put up "production" code up front. need to scrub the stuff that gets me in trouble security-wise (but leave in the stuff that makes me look stupid :-)	[reply]
Re: mod_perl was hosed by a bind variable by CountZero (Bishop) on Jan 30, 2004 at 23:31 UTC
Happening to have a mod-perl enabled Apache 2 handy, I tried the following script: #! x:/xampp/perl/bin/perl.exe use strict; use DBI; use CGI qw/:standard/; use Benchmark qw/timethese/; my $dbh=DBI->connect("DBI:mysql:pandirecords",); my $cgi = new CGI; print $cgi->header; print $cgi->start_html('DBI & mod-perl'); print '<pre>'; timethese( 10000, { inline => 'xinline', bind => 'xbind', }); print '</pre>'; print end_html; sub xbind { my $parameter='0000284364'; my $sql = 'SELECT insured_name FROM claims WHERE ref_underwriter = ?' +; my $sth = $dbh->prepare( $sql ); $sth->execute( $parameter ); my @res = $sth->fetchrow_array; } sub xinline { my $parameter='0000284364'; my $sql = 'SELECT insured_name FROM claims WHERE ref_underwriter ="' +. $parameter .'"'; my $sth = $dbh->prepare( $sql ); $sth->execute; my @res = $sth->fetchrow_array; } [download] It had the following result: `cgi-bin Benchmark: timing 10000 iterations of bind, inline... bind: 7 wallclock secs ( 2.56 usr + 0.65 sys = 3.21 CPU) @ 31 +12.36/s (n=10000) inline: 7 wallclock secs ( 2.60 usr + 0.50 sys = 3.11 CPU) @ 32 +20.61/s (n=10000)` [download] and `mod-perl Benchmark: timing 10000 iterations of bind, inline... bind: 7 wallclock secs ( 3.02 usr + 0.61 sys = 3.63 CPU) @ 27 +51.03/s (n=10000) inline: 8 wallclock secs ( 3.10 usr + 0.62 sys = 3.72 CPU) @ 26 +91.07/s (n=10000)` [download] The results are a bit unexpected in that the mod-perl script seems to run a bit slower than the cgi-bin script. I guess this is due to the fact that both the browser, Apache and MySQL were running on the same machine and that the major part of the time was anyhow spent in fetching the records, rather than executing the perl-script. Anyhow, I could not find any significant slowdown between the 'bind' and the 'inline' version of the script, so I must conclude that the huge difference you found must come from somewhere else. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l] [select]
Re: Re: mod_perl was hosed by a bind variable by dws (Chancellor) on Jan 31, 2004 at 02:41 UTC
I could not find any significant slowdown between the 'bind' and the 'inline' version of the script, so I must conclude that the huge difference you found must come from somewhere else. Note that there's a subtle, but possibly highly significant, difference between the way you're generating your inline query and the way the OP generates his.	[reply]
Re: Re: Re: mod_perl was hosed by a bind variable by CountZero (Bishop) on Jan 31, 2004 at 08:16 UTC
Yes, but it was my intention to check whether there was a difference between the 'bind' and the 'inline' versions over cgi and mod-perl execution using very similar syntax in both versions, the only difference being the use of the '?' placeholder. I did not expect any difference (placeholders being emulated in DBD:MySQL), nor could I indeed find any significant difference, leading me to conclude that there is no interference between cgi/mod-perl as far as using place-holders is concerned. So the reason must be found somewhere else. Update: I tried using the `$sth->bind_param(1, $parameter)` to see if it made a difference: nope, all remained the same (within the to be expected statistical variations). So either it must be a data-issue (seems rather far-fetched) or something with the OP's code (but we haven't seen it, so I cannot comment). CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l]
Re: Re: mod_perl was hosed by a bind variable by CountZero (Bishop) on Jan 31, 2004 at 08:09 UTC
Thinking that perhaps there was any influence of the MySL results cache, I changed the queries to `SELECT SQL_NO_CACHE insured_name FROM claims WHERE ...` so as to "switch off" the cache for this query. The results did not change (perhaps because the query already was very optimised and used the primary key to find the unique result row). CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l]