Speeding up the DBI

Those of us who have learned the DBI from the book, or from the official perldoc some versions ago, may not have noticed that the DBI, in recent versions, has added several new features and improved existing ones.

This is my case, anyway. I started working with the DBI from version 1.13, and only recently I opened my eyes to a whole new set of improvements.

This article discusses some of such features, with particular emphasis to the ones aiming at the application efficiency.

Choosing the right idiom
relative speed of fetch* methods
fetchall_arreyref and large results in batches
Creating the example table

The example table in MySQL
The example table in SQLite

Profiling and benchmarking the DBI

Profiling and benchmarking basics
How to use profiling
More ways of using DBI::Profile
A heavy load profiling and benchmarking script

Parting thoughts

Choosing the right idiom

Let's start with the easy stuff.

This is not really an improvement in recent versions, but I believe it is important to mention it here, since there could be some misunderstanding about this issue.

When issuing queries that do not return any result, the official way is to use the $dbh->do() method. Nothing wrong with that in principle, provided that we don't use it inside a critical loop.

for (@values) {
   my $affected =
      $dbh->do(qq{INSERT INTO $table (col1, col2) VALUES (?, ?)},
        undef, @$_);
}
[download]

What's wrong with this code? Nothing, except the do method is quietly creating and destroying a statement handler at each loop.

It's better to rewrite the loop this way.

my $sth = $dbh->prepare(qq{INSERT INTO $table (col1, col2) VALUES (?, 
+?)});
for (@values) {
   # do something with the values
   my $affected = $sth->execute (@$_);
}
[download]

The performance gain is achieved even for those databases (like MySQL) that don't support a prepare feature. Avoiding the creation and destruction of a statement handler in large loops can speed up things a lot.

Even better, if your database supports it, use the multiple INSERT statement.It requires more work on the client side, but you can insert thousands of records at once.

my $query = qq{INSERT INTO $table (column1, column2) VALUES };
my @values = (['a','b','c'],
              ['d','e','f'],
              ['g','h','i'] );

my $start=0;
for (@values) {
    $query .= ',' if $start++;
    $query .= '('
       . (join (",", map { $dbh->quote($_)} @$_))
       .')';
}
my $affected = $dbh->do($query);
[download]

The above code works with MySQL. You should adjust to your database SQL syntax if different. See below a sample script with a more complete example of this idiom.

Relative speed of fetch* methods

You know that fetchrow_arrayref is faster than fetchrow_array(). And you also know that fetchall_arrayref and selectall_arrayref are faster than fetchrow_arrayref. The question is how much faster? And, more important, is are they always faster?

There are some interesting benchmarks in one of Tim Bunce's presentations (See the whole stuff and the benchmarks).

Using a test MySQL table of 100_000 records with 20 columns of 9 chars each, Tim says that fetchrow_array is 12% slower than fetchrow_arrayref.

method	columns	fetches per second	gain
fetchrow_array	1	51,155	--
fetchrow_array	20	24,032	--
fetchrow_arrayref	1	58,653	12% faster
fetchrow_arrayref	20	49,390	51% faster
fetchall_arrayref	1	348,140	the fastest
fetchall_arrayref	20	42,128	slower than fetchrow_arrayref

Notice first of all that fetchall_arrayref is slower than fetchrow_arrayref when fetching a relatively large amount of data. Why? Because the gain in speed is balanced by the amount of time needed to allocate memory for the results. Actually, if you call fetchall_arrayref without giving any size limit, you may very well get into trouble, either exhausting your memory or slowing your application response time unbearably.

But, wait a minute! How did he come up with such figures? How can you tell how many fetches per second your application has got?

You may think that, since he is the DBI creator, he must have a special test suite that he uses to amaze listeners at the Open Source conventions. Nothing like that. You can get the same results with the help of DBI::Profile.

But before coming to that, let's first approach the subject of dealing with large results.

fetchall_arrayref and large results in batches

We have seen that fetchall_arrayref is the fastest method you can use for fetching rows. The problem is that, when you have a large dataset that can even become larger as time goes, you are faced with the problem of memory allocation. Fetching 100,000 rows with 20 columns of 9 chars each is 17 MB, but Perl is not going to allocate simply that. Due to Perl's dynamic memory allocation for arrays, your 17 MB are likely to become 20 or more. And surely there are cases where you don't want that much memory locked up. But anyway, even if Perl were allocating just the memory it exactly takes to store your data and the records were 1 million or 10 million, then you would definetely avoid fetchall_arrayref.

So, unless you really need to have all your dataset nicely stored in your client memory all at once, you should either use fetchrow_arrayref or use another particular idiom that Tim Bunce recommends.

The truth is, despite the name, you don't have to fetch all the records fetchall_arrayref. You can pass a $max_rows argument, which will make fetchall_arrayref behave like a fetchrow_arrayref on steroids.

$sth->execute; # dataset has 1,000,000 records
#
# DON'T DO THIS
# my $aref = $sth->fetchall_arrayref;
#
# It would create a 1,000,000 rows array

my $max_rows = 5_000;

while (my $aref = $sth->fetchall_arrayref(undef, $max_rows))
{
    # do something with $aref
    # $aref now contains (at most) 5,000 rows
};
[download]

Passing $max_rows to the method we give a limit of the number of rows that we want returned to our array reference at any given time. So instead of having an array of 1,000,000 rows, you are going to have just a 5,000 rows array, which will be refreshed 200 times until all rows are processed.

If you want to process the rows one by one and still take advantage of fetchall_arrayref speed, you can use this handy idiom.

my $rowcache;
while (my $aref = shift(@$rowcache)
|| shift ( @{$rowcache= $sth->fetchall_arrayref(undef, $max_rows)} )
)
{
    # do something with $aref
    # $aref now contains only one row
};
[download]

$aref gets the next available row from $rowcache. When it is empty, another call to fetchall_arrayref will replenish the cache, and so on till the end.

Creating the example table

Before profiling, we need to create a suitable data set. Here are two scripts to create a 100,000 records table with MySQL and SQLite.

The example table in MySQL

The following script creates a table "testdbi" in the "test" database. You can change the number of wanted records, columns, sizes, and the number of rows per insert query.

This scripts gives you also an example of how to build a large INSERT query with multiple lines. The default value is 10,000 rows per query. If your system has a small max_allowed_packet (the default is 1MB) you can either increase its size or reduce the number of rows.

#!/usr/bin/perl -w
# create a MySQL test database
use strict;
use DBI;
my $db = shift || 'test';

my $dbh = DBI->connect("dbi:mysql:$db"
            . ";mysql_read_default_file=$ENV{HOME}/.my.cnf", undef, un
+def,
# replace with the following line if you don't have a config file
#       , "username", "password",
        {RaiseError => 1})
   or die "can't connect\n";

my $max_fields         = 20;
my $field_size         = 9;
my $max_records        = 100_000;
my $max_rows_per_query = 10000;   # adjust this value to max_allowed_p
+acket
my $count              = 0;
my $text               = 'abcdefghi';
my $inserted           = 0;

$dbh->do(qq{drop table if exists testdbi});
$dbh->do(qq{create table testdbi
    ( }
    . (join ",", map {"id$_ char($field_size) not null"} (1..$max_fiel
+ds))
    .qq{ , primary key (id1)
    )});

my $base_query = qq{INSERT INTO testdbi VALUES};
my $query = $base_query;

# create a multiple insertion statement
# INSERT ... VALUES (...), (...), (...) ... ;
#
for (1..$max_records) {
    $query .= ',' if $count++;
    $query .= '('
        . join( ",", map( {"'". ($text++)."'"}(1 .. $max_fields)))
        .')';
    if ($count >= $max_rows_per_query) {
        $inserted += $dbh->do($query);
        print "$inserted\n";
        $query = $base_query;
        $count = 0;
    }
}
$inserted += $dbh->do($query) if $count;
print "inserted $inserted records\n";
$dbh->disconnect;
[download]

The example table in SQLite

The table created is the same seen for the MySQL example. The main difference is that SQLite does not support multiple lines INSERTs. The point here is the transaction size. If you COMMIT too often, the insertion takes forever. Depending on your system memory, a transaction size ($max_commit) between 1,000 and 5,000 should allow for a creation in a reasonably short time (20 seconds in my P4 1.7Ghz).

#!/usr/bin/perl -w
# create a SQLite test database
use strict;
use DBI;

my $db = shift || 'test';

my $dbh = DBI->connect("dbi:SQLite:$db", "","",
        {RaiseError => 1, PrintError=>0})
   or die "can't connect\n";

my $max_fields  = 20;
my $field_size  = 9;
my $max_records = 100_000;
my $max_commit  = 5_000;
my $text        = 'abcdefghi';
my $inserted    = 0;

my $create_query =
    "create table testdbi ( "
    . (join ",", map {"id$_ char($field_size) not null"} (1..$max_fiel
+ds))
    . ", primary key (id1))";

# Drops existing table, ignoring errors.
# Sort of "DROP TABLE IF EXISTS" in MySQL
eval { $dbh->do(qq{drop table testdbi}) };

$dbh->do($create_query);
$dbh->do('begin');

my $sth = $dbh->prepare(qq{INSERT INTO testdbi VALUES (}
    . (join ",", map {'?'} (1..$max_fields))
    .")" );

for (1..$max_records) {
    $inserted += $sth->execute( map {$text++} (1..$max_fields) );
    if (($inserted % $max_commit) == 0) {
        $dbh->do('commit');
        print "$inserted\n";
        $dbh->do('begin');
    }
}
$dbh->do('commit');
print "inserted $inserted records\n";
$dbh->disconnect;
[download]

Profiling and benchmarking the DBI

To improve the efficiency of a program we need to find its bottlenecks, the parts where the application spends most of the time, so that we can concentrate on the hot spots.

Benchmarking is something we do when we want to compare two or more different approaches, to find the most efficient one.

Profiling, on the other hand, is what we do when we don't know where to look, and we want to know which parts of our application are occupying most of the time.

Profiling and benchmarking basics

Perl script profiling is something you can do using the general purpose module Devel::Profile. This is a module that runs your script in debug mode and tells you how long does it take to run each routine in it. However, it tells you the time spent in each sub looking from outside. You may get a clue of where the bottleneck is, but when dealing with the DBI, where efficiency is paramount, you may want to go further, and get more detail, which should only come from the DBI innards.

Benchmarking is a possibility. You compare a few very intensive routines that use similar DBI methods, to see how they can affect the whole application. Trouble is, for benchmarking a fetch method you should either include in the sub also the execute call or compare the method using two different handlers, thus risking that the general performance of the database may influence your results one way or another.

I am not saying that benchmark are useless. Depending on your needs, they could fit the bill just fine, and actually I use them quite often. I just want to add a word of caution about examining carefully what you are including in your benchmarks, either implicitly or explicitly.

If you don't like benchmarks, or they don't suit your immediate purpose, or simply you want to know what yoy need to benchmark, then the other road to take is DBI::Profile. Unlike Devel::Profile, it works in direct co-operation with the DBI, using appropriate handles to give you accurate measurements of what your DBI methods are doing. It will also profile itself, so you know how much overhead you should consider when evaluating its results.

How to use profiling

The easiest way of enabling profiling is by setting an environment variable.

Given this test script,

#!/usr/bin/perl -w
# test_profile.pl
use DBI;
use strict;

my $dbh = DBI->connect("DBI:mysql:test;host=localhost",
    "user", "password", {RaiseError => 1})
        or die "can't connect\n";
my $query = qq{select count(*) from testdbi};
my $sth = $dbh->prepare($query);
$sth->execute();
print "@$_" while  $_ = $sth->fetchrow_arrayref();
$dbh->disconnect();
[download]

In Windows:

c:\path> set DBI_PROFILE=2
c:\path> perl test_profile.pl
[download]

In Unix:

$ DBI_PROFILE=2 perl test_profile.pl
[download]

And the output should be something like this.

DBI::Profile: 0.002769 seconds 1.09% (16 method calls) test_profile.pl
'' =>
    0.002072s / 10 = 0.000207s avg (first 0.000007s, min 0.000003s, ma
+x 0.001857s)
'select count(*) from testdbi' =>
    0.000697s / 6 = 0.000116s avg (first 0.000119s, min 0.000010s, max
+ 0.000511s)
[download]

Not really enlightening, eh? Let's try with some other level. Using DBI_PROFILE=4 we get more info:

DBI::Profile: 0.002569 seconds 0.46% (16 method calls) test_profile.pl
'DESTROY' =>
    0.000021s / 2 = 0.000010s avg (first 0.000011s, min 0.000010s, max
+ 0.000011s)
'FETCH' =>
    0.000007s
'STORE' =>
    0.000052s / 5 = 0.000010s avg (first 0.000026s, min 0.000003s, max
+ 0.000026s)
'connect' =>
    0.001862s
'default_user' =>
    0.000031s
'disconnect' =>
    0.000106s
'disconnect_all' =>
    0.000014s
'execute' =>
    0.000314s
'fetchrow_arrayref' =>
    0.000041s / 2 = 0.000021s avg (first 0.000028s, min 0.000013s, max
+ 0.000028s)
'prepare' =>
    0.000121s
[download]

Now you can see that every DBI method used in the script is separately profiled. Let's look at the entry for fetchrow_arrayref. The first number is the actual time spent (0.000041 seconds), which, divided by the number of calls (2) gives the average (0.000021 seconds) time for each execution.

Looking at the script again, you see that there is only a "SELECT COUNT(*)" query without "GROUP BY" clause, so it should return only one row. Why is DBI saying that fetchrow_arrayref was called 2 times? I leave this as an exercise for the reader. (Highlight the text below with your mouse to show the answer)

fetchrow_arrayref is called within the while loop until it returns undef.

The first time it is called, it returns the row. The second time, it returns undef, thus telling that there are no more records.

Even in the case of an empty data set, the fetch* method is called at least once, when it returns undef.

More ways of using DBI::Profile

The environment variable method can be useful for a quick and dirty solution. But the DBI::Profile is more than that, and it provides you several more methods to fine tuning your application.

Before I go into details, let me tell you that, for all the profiling methods, you can redirect the output to a file, format the output with DBI::ProfileDumper and read the report using an ad hoc application (dbiprof).

For example

$ DBI_PROFILE=4/DBI::ProfileDumper perl test_profile.pl
$ dbiprof -match key1=fetchrow_arrayref

DBI Profile Data (DBI::ProfileDumper 1.0)

  Program       : test_profile.pl
  Path          : [ DBIprofile_MethodName ]
  Total Records : 1 (showing 1, sorted by total)
  Total Count   : 2
  Total Runtime : 0.000041 seconds

#####[ 1 ]###########################################################
  Count         : 2
  Total Time    : 0.000041 seconds
  Longest Time  : 0.000028 seconds
  Shortest Time : 0.000013 seconds
  Average Time  : 0.000021 seconds
  Key 1         : fetchrow_arrayref
[download]

A heavy load profiling and benchmarking script

As a last issue, I present you a profiling and benchmarking script. It uses the test table created above and tests the fetch of 1, 10, and 20 columns from 100,000 records.

My intention with this script was to replicate Tim Bunce's benchmarks. After running it in a couple of machines, though, I saw that its results are heavily influenced (as you can imagine) by hardware and software configuration. Especially with MySQL, when available RAM and processor speed change substantially, the results may range hugely. In a well optimized server, I calculated a number of fetches 10 times faster than the ones reported by Tim Bunce.

But the biggest surprise was that, in a fast server, the difference between fetchrow_array and fetchrow_arrayref is sometimes leveled. Notice in the results that fetchrow_array seems to be faster than fetchrow_arrayref when fetching one column only. With a slower computer, this was not the case. I am trying to pinpoint the reason for that, and every help in that direction will be appreciated.

#!/usr/bin/perl -w
use strict;
use DBI;
use DBI::Profile;
use Benchmark qw(cmpthese);

my $dbh;
my $db = 'test';
my $driver = shift || 'mysql';

if ($driver eq 'mysql') {
    $dbh = DBI->connect("dbi:mysql:$db",
        "user", "password", {RaiseError => 1})
        or die "can't connect\n";
}
elsif ($driver eq 'SQLite') {
    $dbh = DBI->connect("dbi:SQLite:$db", "","",
        {RaiseError => 1})
   or die "can't connect\n";
}
else {
    # insert your favorite driver connection code here
    die "Driver $driver not supported\n"
}

#
# testdbi is a table with 100,000 records
# with 20 columns 9 chars each
#

my $query1 = qq{SELECT id1 from testdbi };                 # 1 column

my $query10 = qq{
    SELECT
        id1,id2,id3,id4,id5,id6,id7,id8,id9,id10
    FROM testdbi};                                         # 10 column
+s

my $query20 = qq{
    SELECT
        id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,
        id11,id12,id13,id14,id15,id16,id17,id18,id19,id20
    FROM testdbi };                                        # 20 column
+s

sub profile_calls {
    my ($sth, $columns, $max_rows) = @_;
    my (@array, $rowcache, $aref, $href);
    printf "PROFILE - FETCHING %d COLUMN%s\n", $columns, $columns > 1 
+? "s" :"";
    $sth->{Profile} = DBI::Profile->new;
    $sth->{Profile} = 4;
    $sth->execute; while (@array = $sth->fetchrow_array()) {};
    $sth->execute; while ($aref = $sth->fetchrow_arrayref()) {};
    $sth->execute; while ($href = $sth->fetchrow_hashref()) {};
    $sth->execute; while ($aref = shift(@$rowcache)
        || shift (@{$rowcache= $sth->fetchall_arrayref(undef, $max_row
+s)})) {};
    print $sth->{Profile}->format;
    $sth->{Profile} =0;
}

sub compare_calls {
    my ($sth, $columns, $max_rows) = @_;
    my (@array, $rowcache, $aref, $href);
    printf "COMPARE - FETCHING %d COLUMN%s\n", $columns, $columns > 1 
+? "s" :"";
    cmpthese (5,
    {
     '---fetchrow_array' =>
        sub {$sth->execute; while (@array = $sth->fetchrow_array()) {}
+ },
     'fetchrow_arrayref' =>
         sub {$sth->execute; while ($aref = $sth->fetchrow_arrayref())
+ {} },
      '-fetchrow_hashref' =>
        sub {$sth->execute; while ($href = $sth->fetchrow_hashref()) {
+} },
     'fetchall_arrayref' =>
        sub {$sth->execute; while ($aref = $sth->fetchall_arrayref(und
+ef, $max_rows)) {} },
     'fetchall_arrayref(buffered)' =>
        sub {$sth->execute; while ($aref = shift(@$rowcache)
            || shift (@{$rowcache =
               $sth->fetchall_arrayref(undef, $max_rows)})) {} },
    }
    );
}

my $max_rows = 1000;
my $sth1  = $dbh->prepare($query1);
my $sth10 = $dbh->prepare($query10);
my $sth20 = $dbh->prepare($query20);

compare_calls($sth1,   1, 10_000);
compare_calls($sth10, 10, $max_rows);
compare_calls($sth20, 20, $max_rows);

profile_calls($sth1,   1, $max_rows);
profile_calls($sth10, 10, $max_rows);
profile_calls($sth20, 20, $max_rows);

$dbh->disconnect;
[download]

You can run the script for MySQL of SQLite. Just add the driver name in the command line. For other drivers, it should be enough to add an appropriate connection statement. What follows is a sample output for MySQL (4.0.13) under Linux.

#
# NB. Output edited and trimmed to fit the screen
#
COMPARE - FETCHING 1 COLUMN
Benchmark: timing 5 iterations of fetchrow_array, fetchrow_hashref,
fetchall_arrayref, fetchall_arrayref(buffered), fetchrow_arrayref...

   fetchrow_array:      5 secs ( 2.28 usr +  0.73 sys =  3.01 CPU)
 fetchrow_hashref:     11 secs ( 8.81 usr +  0.76 sys =  9.57 CPU)
fetchall_arrayref:      5 secs ( 2.24 usr +  0.77 sys =  3.01 CPU)
fetchall_arrayref(buf): 5 secs ( 2.57 usr +  0.74 sys =  3.31 CPU)
fetchrow_arrayref:      4 secs ( 2.00 usr +  0.71 sys =  2.71 CPU)

Rate fetchrow_hashref fetchall_arrayref(buffered) fetchall_arrayref
fetchrow_array fetchrow_arrayref
                                    1     2     3     4     5
1 fetchrow_hashref       0.522/s    --  -65%  -69%  -69%  -72%
2 fetchall_arrayref(buf)  1.51/s  189%    --   -9%   -9%  -18%
3 fetchall_arrayref       1.66/s  218%   10%    --    0%  -10%
4 fetchrow_array          1.66/s  218%   10%    0%    --  -10%
5 fetchrow_arrayref       1.85/s  253%   22%   11%   11%    --

COMPARE - FETCHING 10 COLUMNs
Benchmark: timing 5 iterations of fetchrow_array, fetchrow_hashref, fe
+tchall_arrayref,
  fetchall_arrayref(buffered), fetchrow_arrayref...

   fetchrow_array:      13 secs ( 9.92 usr +  1.03 sys = 10.95 CPU)
 fetchrow_hashref:      20 secs (16.97 usr +  1.07 sys = 18.04 CPU)
fetchall_arrayref:      11 secs ( 6.64 usr +  1.09 sys =  7.73 CPU)
fetchall_arrayref(buf): 11 secs ( 7.40 usr +  1.12 sys =  8.52 CPU)
fetchrow_arrayref:       7 secs ( 3.81 usr +  0.99 sys =  4.80 CPU)

 s/iter -fetchrow_hashref ---fetchrow_array fetchall_arrayref(buffered
+)
 fetchall_arrayref fetchrow_arrayref
                                    1    2     3     4     5
1  fetchrow_hashref        3.61    --  -39%  -53%  -57%  -73%
2    fetchrow_array        2.19   65%    --  -22%  -29%  -56%
3 fetchall_arrayref(buf)   1.70  112%   29%    --   -9%  -44%
4 fetchall_arrayref        1.55  133%   42%   10%    --  -38%
5 fetchrow_arrayref       0.960  276%  128%   77%   61%    --

COMPARE - FETCHING 20 COLUMNs
Benchmark: timing 5 iterations of fetchrow_array, fetchrow_hashref, fe
+tchall_arrayref,
fetchall_arrayref(buffered), fetchrow_arrayref...

   fetchrow_array:      22 secs (16.91 usr +  1.24 sys = 18.15 CPU)
 fetchrow_hashref:      30 secs (24.53 usr +  1.14 sys = 25.67 CPU)
fetchall_arrayref:      18 secs (11.99 usr +  1.10 sys = 13.09 CPU)
fetchall_arrayref(buf): 19 secs (12.22 usr +  1.12 sys = 13.34 CPU)
fetchrow_arrayref:      10 secs ( 5.17 usr +  1.26 sys =  6.43 CPU)

 s/iter fetchrow_hashref fetchrow_array fetchall_arrayref(buffered)
                            fetchall_arrayref fetchrow_arrayref
                                    1     2     3     4     5
1  fetchrow_hashref        5.13    --   -29%  -48%  -49%  -75%
2    fetchrow_array        3.63   41%     --  -27%  -28%  -65%
3 fetchall_arrayref(buf)   2.67   92%    36%    --   -2%  -52%
4 fetchall_arrayref        2.62   96%    39%    2%    --  -51%
5 fetchrow_arrayref        1.29  299%   182%  107%  104%    --

PROFILE - FETCHING 1 COLUMN
DBI::Profile: 5.487208 seconds (300110 method calls) test_dbi_calls.pl
'FETCH'             => 0.000013s
'STORE'             => 0.000245s
'execute'           => 2.471070s / 4 = 0.617767s avg
'fetchall_arrayref' => 0.236404s / 101 = 0.002341s avg
'fetchrow_array'    => 0.255235s / 100001 = 0.000003s avg
'fetchrow_arrayref' => 0.270196s / 100001 = 0.000003s avg

'fetchrow_hashref'  => 2.254045s / 100001 = 0.000023s avg

PROFILE - FETCHING 10 COLUMNs
DBI::Profile: 8.832431 seconds (300110 method calls) test_dbi_calls.pl
'FETCH'             => 0.000004s
'STORE'             => 0.000219s
'execute'           => 3.649873s / 4 = 0.912468s avg
'fetchall_arrayref' => 0.930104s / 101 = 0.009209s avg
'fetchrow_array'    => 0.536815s / 100001 = 0.000005s avg
'fetchrow_arrayref' => 0.498186s / 100001 = 0.000005s avg
'fetchrow_hashref'  => 3.217229s / 100001 = 0.000032s avg

PROFILE - FETCHING 20 COLUMNs
DBI::Profile: 12.905533 seconds (300110 method calls) test_dbi_calls.p
+l
'FETCH'             => 0.000006s
'STORE'             => 0.000219s
'execute'           => 5.516249s / 4 = 1.379062s avg
'fetchall_arrayref' => 1.488744s / 101 = 0.014740s avg
'fetchrow_array'    => 0.758613s / 100001 = 0.000008s avg
'fetchrow_arrayref' => 0.690056s / 100001 = 0.000007s avg
'fetchrow_hashref'  => 4.451645s / 100001 = 0.000045s avg
[download]

Parting thoughts

This node ran for longer than I thought when I started thinking about it. Actually, I had something more to say on the whole subject, but I guess that for one reading session this should be more than enough.

What have I learned while digging for this presentation? Mostly that I should take the general instructions with caution. I know that method x is faster than method y, which is in turn faster than method z. All beautiful, except sometimes, depending on the design, the network load, the number of records, method z could be faster than method x.

What to do then? Design carefully, test, benchmark and profile. Then re-read your design. If it makes sense together with the numbers you've crunched in your tests, go ahead. Else, spend some more time thinking about the matter. It usually helps.

Update (1) Thanks to dws, who spotted an inconsistency in one of my first examples. Now fixed.

Update (2) Thanks to jeffa, for spotting some unclosed tags. Fixed.

 _  _ _  _  
(_|| | |(_|><
 _|

Comment on Speeding up the DBI Select or Download Code

Replies are listed 'Best First'.
Re: Speeding up the DBI by perrin (Chancellor) on Jul 14, 2003 at 19:14 UTC
If you're going to talk about performance and DBI, you should really mention prepare_cached and bindcols. These things are discussed in this article.	[reply]
Re: Speeding up the DBI by diotalevi (Canon) on Jul 14, 2003 at 13:16 UTC
I don't have time to comment just now but the comparison between the different fetch methods and data set size isn't all that appropriate for PostgreSQL. There you'd use ORDER BY/OFFSET/LIMIT instead of the $max_rows. You'd also contrive to use a cursor which means you'd probably eschew DBI and use Pg or Pg::Simple. That is, if you found yourself working with really large result sets. The DBD::Pg driver always has PostgreSQL complete the entire SQL command and then copies the entire result over to perl. Trying to iterate with ->fetch doesn't help with that - you'd need a cursor (which isn't supported under DBI) or to reduce the data size. Anyone else care to follow up with some concrete advice? I've got to head off to work just now.	[reply]
Re: Re: Speeding up the DBI by tantarbobus (Hermit) on Jul 14, 2003 at 19:32 UTC
Yes it is. DBD::mysql behaves in the same way that DBD::Pg does when it comes to pulling all of the rows from the database server. If you want DBD::mysql to act as if it were using cursors, you can specify mysql_use_rows on the statement handle, and then it will not pull over the whole data set on execute. For DBD::Pg, you can declare a cursor and then execute the fetch in a loop until you have recieved all of the data. Admittedly this coould be done in a cleaner fashion where prepare could take an attribute and if prepare sees that attribute it will declare a cursor and rewrite your statement to use the cursor (which is what Pg::Simple does,IIRC)	[reply]
Re: Re: Re: Speeding up the DBI by diotalevi (Canon) on Jul 14, 2003 at 19:39 UTC
I had been under the impression that cursors don't work through DBD::Pg. I'm mistaken?	[reply]
Re: Re: Re: Re: Speeding up the DBI by tantarbobus (Hermit) on Jul 14, 2003 at 20:04 UTC
Re^3: Speeding up the DBI by Anonymous Monk on Sep 04, 2013 at 18:44 UTC
If you want DBD::mysql to act as if it were using cursors, you can specify mysql_use_rows on the statement handle, and then it will not pull over the whole data set on execute. I think you meant "mysql_use_result", some grief and head banging over that typo, so hopefully that'll help some wanderer around here also trying to speed up $sth->execute	[reply]
Re: Speeding up the DBI by drewbie (Chaplain) on Jul 14, 2003 at 15:01 UTC
Excellent writeup! ++ for the tips. I had noticed Tim's Slides on search.cpan.org > Recent but have not yet had the time to read through them. I guess I will today. :-) Now you've got me thinking I need to do some profiling on Class::DBI and see if there are any speedups I can find. I'm using it on all my projects for the huge increase in programmer productivity. I've always wondered how much it was costing me in DB processing time (not that I've cared up till today... :-)	[reply]
Re: Speeding up the DBI by spazm (Monk) on Dec 17, 2008 at 20:57 UTC
The idiom listed above is slightly flawed, perhaps the behavior of fetchall_arrayref has changed in the last few years? Original: `my $rowcache; while (my $aref = shift(@$rowcache) \|\| shift ( @{$rowcache= $sth->fetchall_arrayref(undef, $max_rows)} ) )` [download] From perldoc DBI: `Here�s an example (assumes RaiseError is enabled): my $rows = []; # cache for batches of rows while( my $row = ( shift(@$rows) \|\| # get row from cache, or +reload cache: shift(@{$rows=$sth->fetchall_arrayref(unde +f,10_000)\|\|[]}) ) ) { ... }` [download] The difference being: `\|\| []`. fetchall_arrayref is returning undef instead of an empty arrayref. The original code as listed will give an annoying "attempt to use undefined value as an ARRAYREF" style error after processing the last line of sql output. The error line will be attributed to the last line of the while loop since the error happens in the while loop comparator.	[reply] [d/l] [select]
Re: Speeding up the DBI by jbrugger (Parson) on Jun 09, 2005 at 10:06 UTC
I found a nice article here. Basically, you want large sets not in an arrayref, but as a hash, since it's more clear what column holds the data. e.g. `# tested this on live data, this is twice as fast as fetchrow_hashref, + and has the same advantage.. # example taken from the link above. my @fields = (qw(emp_id first_name monthly_payment)); $sth->execute; my %rec =(); $sth->bind_columns(map {\$rec{$_}} @fields); print "$rec{emp_id}\t", "$rec{first_name}\t", "$rec{monthly_payment}\n" while $sth->fetchrow_arrayref;` [download] Update Hmm, Funny indeed, anyway, i still consider this as a valuable way to speed up the DBI. :-) "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.	[reply] [d/l]
Re^2: Speeding up the DBI by cchampion (Curate) on Jun 09, 2005 at 10:22 UTC
Now that was quite funny. :-) You are quoting to gmax from an article that he has written (in case you didn't realize it, the article says that it was originally published as DBI recipes.)	[reply]
Re: Speeding up the DBI by nigelhorne (Initiate) on Jun 07, 2018 at 14:12 UTC
Great ideas, thank you. However my $rowcache; while (my $aref = shift(@$rowcache) \|\| shift ( @{$rowcache= $sth->fetchall_arrayref(undef, $max_rows)} ) ) fails (presumably when there is nothing left) with Can't use an undefined value as an ARRAY reference So some refactoring is neede.	[reply]
Re^2: Speeding up the DBI by hippo (Bishop) on Aug 03, 2022 at 11:04 UTC
See Re: Speeding up the DBI 🦛	[reply]