Xenofur has asked for the wisdom of the Perl Monks concerning the following question:



Edit: Figured the problem itself out, i was misusing the bind command as it is used to bind arrays of data to the columns, instead of datasets. If anyone has speed suggestions, i'd still like to hear 'em though.



Greetings monks,

I am trying to do a large number of inserts (30 chunks with ~200_000 inserts each) into an SQLite database and am looking for methods to speed that up since doing single inserts for each is pitifully slow.

So far i got this:
############################################################ my $csv_fh = Tie::Handle::CSV->new( csv_parser => Text::CSV_XS->new( {allow_whitespace => 1} ), file => $unpackedname, simple_reads => 1 ); $dbh->do("BEGIN;"); my $sth = $dbh->prepare( "REPLACE INTO orders ( orderid, volume, price ) " . " VALUES(?,?,?);" ); while (my $order = <$csv_fh>) { $sth->execute( $order->{orderid}, $order->{volremain}, $order->{price} ); } close $csv_fh; $dbh->do("END;"); ############################################################

Now i was hoping to make it a bit faster by using execute_array. The code for that looks as follows and matches the example given in the DBI documentation:
############################################################ my $csv_fh = Tie::Handle::CSV->new( csv_parser => Text::CSV_XS->new( {allow_whitespace => 1} ), file => $unpackedname, simple_reads => 1 ); $dbh->do("BEGIN;"); my $sth = $dbh->prepare( "REPLACE INTO orders ( orderid, volume, price ) " . " VALUES(?,?,?);" ); my $insert_count = 0; while (my $order = <$csv_fh>) { $sth->bind_param_array( ++$insert_count, [ $order->{orderid}, $order->{volremain}, $order->{price} ] ); } close $csv_fh; $sth->execute_array( { ArrayTupleStatus => \my @tuple_status } ); $dbh->do("END;"); ############################################################

However that creates this error message:
DBD::SQLite::st execute_array failed: 84490 bind values supplied but 3 expected

Did i do something wrong or is this a limitation of SQLite? Does someone maybe have any suggestions for other ways to speed this up?

Replies are listed 'Best First'.
Re: Mass inserts with SQLite + CSV + execute_array?
by Tanktalus (Canon) on Dec 14, 2008 at 15:08 UTC

    From my reading of the DBI docs, you kind of have that backwards.

    my @params; while (my $order = <$csv_fh>) { push @{$params[0]}, $order->{orderid}; push @{$params[1]}, $order->{volremain}; push @{$params[2]}, $order->{price}; } $sth->bind_param_array(1, $params[0]); $sth->bind_param_array(2, $params[1]); $sth->bind_param_array(3, $params[2]);
    That looks closer to what DBI says. Also, make sure you have AutoCommit off. That helps most DB's, though I've not tried SQLite yet.

    (Yes, that code can be optimised, and likely even shortened, but it's too early in the morning here, so I thought I'd just try to convey the message, and leave it an excersise to the reader to clean it up :-P )

      Yeah, as noted in the edit up top, i noticed that. And aside from using a hash i'm using exactly the same code as you now.

      As for AutoCommit, that is what the Begin/End block is there for, it does the same thing, only locally constrained.

        I wasn't aware that DBI used BEGIN and END blocks to temporarily disable AutoCommit. I would suggest actually setting AutoCommit => 0 in your DBI->connect statement if you aren't already.

Re: Mass inserts with SQLite + CSV + execute_array?
by hangon (Deacon) on Dec 14, 2008 at 20:53 UTC
    ... If anyone has speed suggestions, i'd still like to hear 'em though.

    I asked a similar question recently. Check out this node, Loading bulk data into SQLite, and its replies for some suggestions on the speed issue.

      Thanks for that, while most of the findings there are the same as mine, there's still the point about indexes. And i found out in "SQLite Administrator" that, while the orderid was a primary key, there was no index on it. Maybe that's because i created it like this: "orderid INTEGER PRIMARY KEY UNIQUE NOT NULL," instead of setting it explicitly to primary in the last line of the create query.

      Edit: Huh, this is weird. Neither way of creation actually creates an index... What's wrong with this query?
      CREATE TABLE IF NOT EXISTS moo3 ( orderid INTEGER PRIMARY KEY NOT NULL, volume INTEGER NOT NULL, price REAL NOT NULL )
Re: Mass inserts with SQLite + CSV + execute_array?
by CountZero (Bishop) on Dec 14, 2008 at 18:02 UTC
    Just out of sheer curiosity, what was the speed-up between the original "single inserts" and the "mass inserts"?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I didn't do any serious benchmarking, so i don't know if the execute_array thing did any good. It didn't do any noticeable good, so i reverted to the form above.

      I can however say, that using the BEGIN;END; pair gave at least 10 times more speed, but i think that's mostly a SQLite thing.