comment on

While I was trying to squeeze the most in means of speed/CPU utilization out of my code, I have stumbled upon a peculiar DBD::SQLite behavior with regards to bulk data insert. I'd like to illustrate it with the following snippet:

#!/usr/bin/perl

use strict;
use warnings;
use 5.14.0;

use Benchmark;
use DBI;

my $dbh = DBI->connect('dbi:SQLite::memory:', '','',
                       { AutoCommit => 1, RaiseError => 1, PrintError 
+=> 0,
                         ChopBlanks => 1 }) or die DBI->errstr;

$dbh->do("CREATE TABLE test_bulk_insert (\n" .
         (join ",\n", map { "field$_ int" } (1..$ARGV[0])) . ");\n") o
+r
                die $dbh->errstr;

my $sth = $dbh->prepare("INSERT INTO test_bulk_insert (\n" .
                        (join ",\n", map { "field$_" } (1..$ARGV[0])) 
+.
                        ') VALUES (' .
                        (join ",\n", map { '?' } (1..$ARGV[0])) . ')')
+ or
        die $dbh->errstr;
my @newdata;
push @newdata, [] for (1..$ARGV[0]);
$sth->bind_param_array($_ + 1, $newdata[$_]) for keys @newdata;

my $data;
$data .= sprintf(("%-4d"x$ARGV[0])."\n",
        map { int rand 9999 } (1..$ARGV[0]))
                for (1..$ARGV[1]);
my $template = "A4"x$ARGV[0];

timethese($ARGV[2], {

'new' => sub {
  my $buf = $data;
  my ($curr, $prev);

  while ($buf =~ /\n/g) {
    $prev = defined $curr ? $curr : 0;
    $curr = pos $buf;
    my $_ = substr $buf, $prev, $curr - $prev;
    chomp;
    my @cols = unpack $template;
    push $_, shift @cols for (@newdata);
  };

  eval {
     $dbh->do("BEGIN IMMEDIATE TRANSACTION");
     $dbh->do("DELETE FROM test_bulk_insert");
     $sth->execute_array({}, @newdata);
     $dbh->commit;
  } or die $@;

  $#{$newdata[$_]} = -1 for keys @newdata;
},

'old' => sub {
  my $buf = $data;
  my ($curr, $prev, @batch);

  while ($buf =~ /\n/g) {
    $prev = defined $curr ? $curr : 0;
    $curr = pos $buf;
    my $_ = substr $buf, $prev, $curr - $prev;
    chomp;
    push @batch, [unpack $template];
  };

  eval {
    $dbh->do("BEGIN IMMEDIATE TRANSACTION");
    $dbh->do("DELETE FROM test_bulk_insert");
    $sth->execute(@$_) for @batch;
    $dbh->commit;
  } or die $@;
},

});

$dbh->disconnect;
[download]

The 'new' routine tries to take advantage of DBI execute_array method but somehow it fails miserably: instead of being faster it's actually twice slower:

dwalin@sol:/tmp$ perl test.pl 100 1000 100
Benchmark: timing 100 iterations of new, old...
       new: 22 wallclock secs (22.10 usr +  0.28 sys = 22.38 CPU) @  4
+.47/s (n=100)
       old: 10 wallclock secs ( 9.87 usr +  0.08 sys =  9.95 CPU) @ 10
+.05/s (n=100)
[download]

And the cause of slowness is not my code, it's DBD::SQLite. Under Devel::NYTProf it looks most peculiar:

'old' line-by-line insert:

$sth->execute(@$_) for @batch;
 # spent  10.0ms making 10000 calls to DBI::st::execute, avg 1盜/call
[download]

'new' bulk insert:

 $sth->execute_array({}, @newdata);
# spent  3.17s making 10 calls to DBI::st::execute_array, avg 317ms/ca
+ll
# spent  3.17s making 10 calls to DBD::_::st::execute_array, avg 317ms
+/call
[download]

Something in Devel::NYTProf is visibly affecting execution; 'new' code becomes ten times slower than 'old' but the picture is the same - it's slower. Why is that? I tried to look inside DBD::SQLite but I'm not advanced enough to dig deep in DBI and its drivers.

I tested it with perl-5.14.1 under Solaris 9 x86.

Regards,
Alex.

In reply to DBD::SQLite bulk insert woes by dwalin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.