Re: Much slower DBI on RHEL6
by mbethke (Hermit) on Feb 05, 2014 at 22:38 UTC
|
Try running the query with the mysql commandline client to see whether you get a significant runtime difference. That way you could isolate the problem to mysql if possible.
In any case I don't see how the code could possibly work the way the comments indicate was intended. If 'id' is indeed a record ID as is customary in relational DBs, the DISTINCT does nothing. If not, it's indeed useful but then the results will be read one id at a time and nothing like "the hash will only keep one key but it will be the oldest one" happens. Then the same id is read again together with a single description. It looks like a
SELECT DISTINCT id,description from list_index
would do the same job. Or the whole script as
my $sql = 'SELECT DISTINCT id,description FROM list_index';
foreach(@{ $dbh->selectall_arrayref($sql, {Slice => {}}) }) {
$labels{$_->{id}} = "$_->{id} - $_->{description}";
}
If that doesn't help, try and check the DDL, i.e. the CREATE TABLE, especially the indices on both tables. | [reply] [d/l] [select] |
|
+---------+--------------------------------+--------------------+
| id | description | rpt_key |
+---------+--------------------------------+--------------------+
| ABC | Organizational Detail Newer | ABC.2013_10_18 |
| ABC | Organizational Detail Older | ABC.2012_10_15 |
| XYZ | XYZ Status | XYZ.2012_08_05 |
| XYZ | XYZ Status | XYZ.2012_08_12 |
| XYZ | XYZ Status | XYZ.2013_08_14 |
+---------+--------------------------------+--------------------+
So the values I would want are:
ABC - Organizational Detail Newer
XYZ - XYZ Status
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
Re: Much slower DBI on RHEL6
by erix (Prior) on Feb 05, 2014 at 23:22 UTC
|
Are the database tables the same size?
Do the database tables have the exact same data?
Do the database tables have the same indexes?
Perhaps one machine (wisely) chooses to scan the table because the search conditions are not specific enough for an index (which they were on the other machine).
In short, there isn't really enough information...
update: In general, it's a shame that database questions tend to talk of "meh, slow...", "fast!", "yippee, much faster!!" but omit to post timings and/or pertinent database plans (EXPLAIN PLAN variants). Also, or even especially, when the case is solved.
| [reply] |
|
| [reply] |
|
Perl 5.10 IIRC was one of the less optimised Perl versions. I think some benchmarks put it 20-50% slower than 5.8. 5.12 and 5.14 improved the speed much. (Sorry, no sources to back this up at this time.)
| [reply] |
|
|
|
Hi,
this thread is very interesting. How have you measured that mysql is gathering the informations at the same speed? Can you present the mysql version numbers on the different distributions? Have you checked whether the same query plan is used by mysql?
Best regards
McA
P.S.: You use the same hardware for RHEL4 and RHEL6 and the same Linux config?
| [reply] |
|
It is the exact same data... same table structure, etc... the data was dumped on the RHEL 4 and imported onto the RHEL 6 machine.
| [reply] |
Re: Much slower DBI on RHEL6
by Tux (Canon) on Feb 06, 2014 at 16:39 UTC
|
Triggered by some other issues that came up in this thread, I reworked my speed test for access methods. The conclusion might be that when you run your database locally, as in no network delays for databases running on another host, the speed difference between the access methods is huge. fetchrow_hashref should not be used when speed really matters and the number of records visited is relatively big (for one record you won't notice a difference, as the overhead of setting up the handle will make the real fetch work disappear in noise).
Assuming you already prepared your statement handle in $sth, DBI's access-methods are:
HR while (my $ref = $sth->fetchrow_hashref) {
# use $ref->{c_base} and $ref->{base}
A while (my ($c_base, $base) = $sth->fetchrow_array) {
# use $c_base and $base
AR while (my $ref = $sth->fetchrow_arrayref) {
# use $ref->[0] and $ref->[1]
DAR while (my $ref = DBI::st::fetchrow_arrayref ($sth)) {
# use $ref->[0] and $ref->[1]
BC $sth->bind_columns (\my ($c_base, \$base));
while ($sth->fetch) {
# use $c_base and $base
DBC $sth->bind_columns (\my ($c_base, \$base));
while (DBI::st::fetchrow_arrayref ($sth)) {
# use $c_base and $base
When measured against relatively actual installations of the databases, the results are (relative speed, higher is better):
perl-5.18.2-64int-ld perl-5.16.2-64all-ld
DBI-1.631 DBI-1.631
DBD::Oracle-1.68 DBD::Oracle-1.68
Oracle 12.1.0.1.0 Oracle 11.2.0.3.0
-> 11.2.0.3.0 -> 11.2.0.3.0
HR 639 HR 387
DBC 804 A 1000
AR 913 AR 1142
BC 956 DBC 1186
DAR 967 DAR 1201
A 999 BC 1250
DBD::SQLite-1.40 DBD::SQLite-1.40
HR 255 HR 250
A 1000 A 1000
AR 1041 AR 1172
DAR 1166 DAR 1244
BC 1333 BC 1250
DBC 1455 DBC 1313
DBD::Pg-3.0.0 DBD::Pg-3.0.0
PostgreSQL 9.3.2 PostgreSQL 9.2.4
HR 198 HR 179
A 1000 A 1000
AR 1143 AR 1279
DAR 1186 DAR 1281
DBC 1326 BC 1417
BC 1332 DBC 1485
DBD::<span style="background:gold">CSV</span>-0.41 DBD::<spa
+n style="background:gold">CSV</span>-0.41
HR 683 HR 791
A 1000 A 1000
AR 1164 BC 1153
DAR 1171 AR 1169
BC 1181 DAR 1176
DBC 1185 DBC 1186
DBD::mysql-4.025 DBD::mysql-4.026
MariaDB-5.5.33 MariaDB-5.5.33
HR 102 HR 95
A 999 A 1000
AR 1126 BC 1245
DAR 1271 AR 1438
BC 1287 DAR 1509
DBC 1328 DBC 1612
DBD::Firebird-1.16
firebird-2.5.2.26539
HR 565
DAR 991
AR 997
A 1000
DBC 1005
BC 1179
As you can see, BC wins over HR by a factor way over 10 on MySQL. YMMV.
Enjoy, Have FUN! H.Merijn
code | [reply] [d/l] [select] |
|
Does Perl's OO method call really produce that much of a slowdown? (BC vs DBC is 3,3 vs 3,9 in SQLite's case.) How does that happen?
while ($sth->fetch) { # bound columns
# most of the time a negligible difference
# but this is sometimes faster?
while (DBI::st::fetchrow_arrayref ($sth) { # 'DBC'
Oh, and any chance of having DBD::Firebird tested, too? | [reply] [d/l] |
|
DBD::Firebird added to the list I posted before. I can't say I was positively surprised in installation and startup. OpenSUSE had the packages readily available, so installation went smooth. service started immediate, but then the shit hits the fan: I find the quick start guide way too Windows-minded and none of the commands is intuitive. Furthermore, the DBD installation is a hell: it does not find the needed libraries of header files in what I thought were pretty default locations. The firebird (and firebird-devel) packages installs the libfbclient.so.2 but no (symbolic) link to libfbclient.so. Creating of a new database has no command-line-tool. All and all I am not very charmed yet.
Enjoy, Have FUN! H.Merijn
| [reply] |
|
Re: Much slower DBI on RHEL6
by Anonymous Monk on Feb 06, 2014 at 10:15 UTC
|
I don't know what's wrong with it -- you should try some place specialising in MySQL. It seems unlikely that Perl/DBI are at fault unless RHEL has done something wonky with the relevant packages.
Anyway, you seem to be doing a death by a thousand queries there and that means _many_ round-trips. I'm sure the script can be rewritten to perform a single query. This is how I'd do it with Postgres-specific syntax:
SELECT DISTINCT ON (id) id, description
FROM list_index li
ORDER BY id, rpt_key DESC;
Writing it with a correlated subquery, we get a query that is terribly slow (one second on my couple-of-thousand-rows dataset):
SELECT DISTINCT id,
(SELECT description from list_index li_i
WHERE li_i.id = li_o.id
ORDER BY created_at DESC limit 1) AS description
FROM list_index li_o;
But moving the distinctness to a subquery makes it fast (~7 milliseconds):
SELECT id,
(SELECT description from list_index li_i
WHERE li_i.id = li_o.id
ORDER BY created_at DESC limit 1) AS description
FROM
(SELECT DISTINCT id FROM list_index) li_o;
The SQL should work on every dialect, but of course, these speed measurements apply only to the PostgreSQL query planner. MySQL's is different. Try and see.
...My brains aren't working well enough right now to produce a way to do it without a dependent subquery. | [reply] [d/l] [select] |
|
And now that my brains are working again, a third option that is quicker than all of those on Postgres, but YMMV:
select li_i.id, description from list_index li_i
join (
select id, max(rpt_key) as rpt_key from list_index group by id
) li_o
on li_i.id = li_o.id and li_i.rpt_key = li_o.rpt_key;
I assume here that the pair (id, rpt_key) is unique, but it should not hurt even if it isn't. | [reply] [d/l] [select] |
|
| [reply] |
|
|
|
Bah. s/created_at/rpt_key/
| [reply] [d/l] |
Re: Much slower DBI on RHEL6
by McA (Priest) on Feb 06, 2014 at 11:23 UTC
|
Hi,
not an answer to your main question. But a kind of annotation looking at your code:
IMHO you're not using the possibility of prepared statements and bind variables. Look at this code:
my $dbh = $me->{DBHANDLE};
my %labels;
# Get all the different ids
my $sql = 'SELECT DISTINCT id FROM list_index ';
my $sth = $dbh->prepare($sql);
$sth->execute;
my $sql2 = 'select id, description from list_index where id = ? order
+ by rpt_key desc limit 1';
my $sth2 = $dbh->prepare($sql2);
while(my $hashref = $sth->fetchrow_hashref) {
$sth2->execute($hashref->{id});
while(my $hashref2 = $sth2->fetchrow_hashref) {
$labels{$hashref2->{id}} = $hashref2->{id} . ' - ' . $hashref
+2->{description};
}
}
$sth2->finish;
$sth->finish;
return \%labels;
With this code you gain two things: a) You don't prepare a new sql statement per loop iteration. b) You use bind variables which is almost always better than inserting values like you did it (sql injection). c) It should be measurable faster.
Best regards
McA
| [reply] [d/l] |
|
my $dbh = $me->{DBHANDLE};
my %labels;
# Get all the different ids
my $sth = $dbh->prepare ("select distinct id from list_index");
$sth->execute;
$sth->bind_columns (\my $id);
my $sth2 = $dbh->prepare (qq;
select description
from list_index
where id = ?
order by rpt_key desc;
# -- I don't know if limit 1 is needed
);
$sth2->execute (0);
$sth2->bind_columns (\my $desc);
while ($sth->fetch) {
$sth2->execute ($id);
while ($sth2->fetch) {
$labels{$id} = "$id - $desc";
}
}
$_->finish for $sth, $sth2;
return \%labels;
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
|
| [reply] |
|
|
|
thank you EVERYBODY for the responses. I didn't mention it before but I had tried optimizing things by using different types of fetches, binding, etc. but nothing seemed to help that much. However, I do have things working much faster now( even faster than it was on RHEL 4)!. I don't know why, but here is what I found thanks to tux's post. I do need the "limit 1" because otherwise, I don't get the most recent description, I get the oldest. However, for whatever reason, when I use desc AND limit 1 things are slloooww. Without "limit 1" things are fast... So, it is much faster for me to iterate through all of the rows of ids just to get the "latest" than it is for me to sort in descending order and get just 1. Hope that makes sense. Thanks again everyone. My code is much more optimized and I'm sure throughout the application there are things can be optimized and more secure so I am going to work on that.
So, in short, still don't know why the same exact Perl code, with the same data, is much slower on RHEL 6 but my issue is resolved. So this is the code I ended up using and it is faster and seems to give me the correct results
my $dbh = $me->{DBHANDLE};
my %labels;
# Get all the different ids
my $sth = $dbh->prepare ("select distinct id from list_index");
$sth->execute;
$sth->bind_columns (\my $id);
my $sth2 = $dbh->prepare (qq;
select description
from list_index
where id = ?
order by rpt_key;
);
$sth2->execute (0);
$sth2->bind_columns (\my $desc);
while ($sth->fetch) {
$sth2->execute ($id);
while ($sth2->fetch) {
$labels{$id} = "$id - $desc";
}
}
$_->finish for $sth, $sth2;
return \%labels;
| [reply] [d/l] |
|
Re: Much slower DBI on RHEL6
by karlgoethebier (Abbot) on Feb 05, 2014 at 21:46 UTC
|
Really same my.cnf?
Regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] |
|
| [reply] |
|
| [reply] |
|
|
|
|
|
Re: Much slower DBI on RHEL6
by ruzam (Curate) on Feb 06, 2014 at 20:35 UTC
|
Well, just to commiserate, not that long ago I migrated an application from a CentOS 5 server to a CentOS 6 server. The CentOS 6 server was faster hardware by every measure, and you could feel it.
But my MySQL table data load times (insert statements through Perl DBI) nearly tripled. Spent too much time searching for a cause and solution without success. Despite the stupid slow load times, the application itself 'did' run faster overall for day to day use (or at least fast enough) and I left it at that. Probably an averaging out of code execution vs query execution
There 'is' something wrong with the Perl/DBI/MySQL performance in CentOS 6. I continue with the assumption that somebody will eventually find the reason and a patch will fix it. In the mean time I accept that the table loads will be slower (a not too often initialization step) and the daily operation is fast enough.
| [reply] |
|
How are you handling commits? I think "autocommit" is turned on by default (commit after every execution of an insert or update), and that can be outrageously slow when slogging through a long list.
I wrote a general-purpose command-line tool for doing inserts/updates using data from stdin or a file, and the best thing about it is the option that lets me set the commit interval. If I forget to use that option on an input of many thousands of rows, I really regret it, but when I set it to commit every 500 or 1000 rows, it flies, and I love it.
| [reply] |