comment on

I modified your test slightly to run against several other databases: PostgreSQL and MySQL in addition to SQLite. What I found is the same result for all of them: your "new" is slower than "old":

./execute_cmp.pl 100 1000 100
Benchmark: timing 100 iterations of SQLite_new, SQLite_old...
SQLite_new: 15 wallclock secs (15.27 usr +  0.07 sys = 15.34 CPU) @  6
+.52/s (n=100)
SQLite_old:  5 wallclock secs ( 5.91 usr +  0.03 sys =  5.94 CPU) @ 16
+.84/s (n=100)
Benchmark: timing 100 iterations of PostgreSQL_new, PostgreSQL_old...
PostgreSQL_new: 66 wallclock secs (26.55 usr +  1.75 sys = 28.30 CPU) 
+@  3.53/s (n=100)
PostgreSQL_old: 37 wallclock secs ( 7.67 usr +  1.14 sys =  8.81 CPU) 
+@ 11.35/s (n=100)
Benchmark: timing 100 iterations of MySQL_new, MySQL_old...
 MySQL_new: 58 wallclock secs (26.45 usr +  1.43 sys = 27.88 CPU) @  3
+.59/s (n=100)
 MySQL_old: 30 wallclock secs ( 8.09 usr +  0.85 sys =  8.94 CPU) @ 11
+.19/s (n=100)
[download]

Maybe it's time to check our assumptions. Is it really given that execute_array() is supposed to be faster? If the driver provides an optimized implementation, then I would say yes. The DBI docs say "Drivers may provide more optimized implementations using whatever bulk operation support the database API provides." (emphasis mine). But what about these three databases/drivers, none of which appear to provide such an optimization?

When execute_array just has to call execute() behind the scenes, is it expected to be 3X slower? As far as I can tell, the only difference *should* be that execute_array() has to copy a large array (of arrayrefs), but that doesn't seem like enough to cause that much slowdown.

I think more investigation is necessary.

Here is your test with the modifications I made:

#!/usr/bin/env perl

use strict;
use warnings;
use 5.14.0;

use Benchmark;
use DBI;

my @databases = (
    { name => 'SQLite',     dsn => 'dbi:SQLite::memory:' },
    { name => 'PostgreSQL', dsn => 'dbi:Pg:dbname=test_execute' },
    {   name => 'MySQL',
        dsn  => 'dbi:mysql:test_execute',
        user => 'test_execute',
        pass => 'password',
    },
);

for my $database (@databases) {

    my $dbh = DBI->connect(
        $database->{dsn},
        $database->{user},
        $database->{pass},
        {   AutoCommit => 1,
            RaiseError => 1,
            PrintError => 0,
            ChopBlanks => 1
        }
    ) or die DBI->errstr;

    $dbh->do("CREATE TABLE test_bulk_insert (\n"
            . (join ",\n", map {"field$_ int"} (1 .. $ARGV[0])) . ");\
+n")
        or die $dbh->errstr;

    my $sth
        = $dbh->prepare("INSERT INTO test_bulk_insert (\n"
            . (join ",\n", map {"field$_"} (1 .. $ARGV[0]))
            . ') VALUES ('
            . (join ",\n", map {'?'} (1 .. $ARGV[0])) . ')')
        or die $dbh->errstr;
    my @newdata;
    push @newdata, [] for (1 .. $ARGV[0]);
    $sth->bind_param_array($_ + 1, $newdata[$_]) for keys @newdata;

    my $data;
    $data .= sprintf(
        ("%-4d" x $ARGV[0]) . "\n",
        map { int rand 9999 } (1 .. $ARGV[0])
    ) for (1 .. $ARGV[1]);
    my $template = "A4" x $ARGV[0];

    timethese(
        $ARGV[2],
        {

            "$database->{name}_new" => sub {
                my $buf = $data;
                my ($curr, $prev);

                while ($buf =~ /\n/g) {
                    $prev = defined $curr ? $curr : 0;
                    $curr = pos $buf;
                    my $_ = substr $buf, $prev, $curr - $prev;
                    chomp;
                    my @cols = unpack $template;
                    push $_, shift @cols for (@newdata);
                }

                eval {
                    $dbh->begin_work();
                    $dbh->do("DELETE FROM test_bulk_insert");
                    $sth->execute_array({}, @newdata);
                    $dbh->commit;
                } or die $@;

                $#{ $newdata[$_] } = -1 for keys @newdata;
            },

            "$database->{name}_old" => sub {
                my $buf = $data;
                my ($curr, $prev, @batch);

                while ($buf =~ /\n/g) {
                    $prev = defined $curr ? $curr : 0;
                    $curr = pos $buf;
                    my $_ = substr $buf, $prev, $curr - $prev;
                    chomp;
                    push @batch, [unpack $template];
                }

                eval {
                    $dbh->begin_work();
                    $dbh->do("DELETE FROM test_bulk_insert");
                    $sth->execute(@$_) for @batch;
                    $dbh->commit;
                } or die $@;
            },

        }
    );

    $dbh->disconnect;
}
[download]

--Daniel

In reply to Re: DBD::SQLite bulk insert woes by danb
in thread DBD::SQLite bulk insert woes by dwalin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.