ibm1620 has asked for the wisdom of the Perl Monks concerning the following question:

New to SQLite, very rusty on DBI and SQL. I'm trying to build a table of unique values, and I need to know their rowids so that other tables can refer to these values, and bring them together with a SELECT .. JOIN. Every time I store a value in the table, I need to first check to see if the value already exists -- if so, I need to obtain its rowid; if not, I need to insert it and then obtain its rowid.

This test program shows how I've tried to do it:

#!/usr/bin/env perl use v5.38; use DBI; my $memory; # = ':memory:'; my $dbname = $memory // '/Users/chap/private/data/music'; my $dbh = create_database(); # insert some songs. watch out for dupes. for my $song (qw{ Alpha Beta Gamma Alpha Delta Omicron }) { my $rowid = insert_unique( 'Songs', 'song', $song ); say "Inserting '$song', got rowid=$rowid"; } sub create_database() { unlink $dbname unless $memory; my $dbms = 'SQLite'; my $dbname = $dbname; my $user = ''; my $passwd = ''; my $dbh = DBI->connect( "DBI:$dbms:dbname=$dbname", $user, $passwd, { RaiseError => 1, AutoCommit => 1 } ) or die "Cannot connect: $DBI::errstr"; my $songs = <<~'EOF'; CREATE TABLE Songs (id INTEGER PRIMARY KEY, song VARCHAR(40) UNIQUE ) EOF $dbh->do($songs); $dbh->do("CREATE INDEX ix_song ON Songs (song)"); return $dbh; } # sub which conditionally inserts a unique value, and returns rowid sub insert_unique( $table, $column, $value ) { state %sths; # cache my prepared statement handles if ( !exists $sths{$table} ) { $sths{$table}{SELECT} = $dbh->prepare("SELECT id FROM $table WHERE $column = (?)") +; $sths{$table}{INSERT} = $dbh->prepare("INSERT INTO $table ($column) VALUES (?)"); } my $rowid = $sths{$table}{SELECT}->execute($value) || die; say "Value='$value': attempt to select yielded rowid=$rowid"; if ( $rowid == 0 ) { say "So value needs to be inserted."; $sths{$table}{INSERT}->execute($value); $rowid = $dbh->do("SELECT last_insert_rowid()"); } else { say "So value has been previously inserted."; } return $rowid; }
Running produces this output:
Value='Alpha': attempt to select yielded rowid=0E0 So value needs to be inserted. Inserting 'Alpha', got rowid=1 Value='Beta': attempt to select yielded rowid=0E0 So value needs to be inserted. Inserting 'Beta', got rowid=1 Value='Gamma': attempt to select yielded rowid=0E0 So value needs to be inserted. Inserting 'Gamma', got rowid=1 Value='Alpha': attempt to select yielded rowid=0E0 So value needs to be inserted. DBD::SQLite::st execute failed: UNIQUE constraint failed: Songs.song a +t /Users/chap/private/perl/insert_unique line 51. DBD::SQLite::st execute failed: UNIQUE constraint failed: Songs.song a +t /Users/chap/private/perl/insert_unique line 51.
Several things were not as expected: executing 'SELECT id...' returned 0e0 even when the value exists in the table (as in adding Alpha the 2nd time). And when the value needed to be inserted, the call to `last_insert_rowid()` always returned 1, rather than the most recently inserted rowid.

The following shows that the data was inserted into the table:

$ sqlite3 music SQLite version 3.43.2 2023-10-10 13:08:14 Enter ".help" for usage hints. sqlite> select * from Songs; 1|Alpha 2|Beta 3|Gamma
In addition to being baffled why this isn't working as I expected, I also wonder if there isn't a simpler way to accomplish this that I haven't thought of. I would imagine that this is a common situation when loading a normalized database.

Humbly yours,

1620

Replies are listed 'Best First'.
Re: SQLite: INSERT into a unique column and retrieve rowid
by Corion (Patriarch) on May 04, 2024 at 06:23 UTC

    ->execute($value) does not return the rowid.

    See last_insert_id of DBI on how to use ->last_insert_id. Also, you can use the RETURNING SQL clause:

    my $sql = <<"SQL"; INSERT INTO $table ($column) VALUES (?) RETURNING id SQL my $sth = $dbh->prepare($sql); $sth->execute(...); my $rows = $sth->fetchall_arrayref( {} ); say "New id: " . $rows->[0]->{id};
Re: SQLite: INSERT into a unique column and retrieve rowid
by Marshall (Canon) on May 04, 2024 at 12:30 UTC
    I am working on an SQlite project now.

    From my code... You can call last_insert_id with no args (all undef) which probably would be fine in your app. Or in my code I asked specifically about the ScoreCard Table.

    my $scoreCardrow = $dbh->last_insert_id(undef,undef,"ScoreCard",undef) +; # last_insert_id($catalog, $schema, $table, $field);
    On another point, from your code:
    CREATE TABLE Songs (id INTEGER PRIMARY KEY,
    You do not need this line. SQLite cannot generate a table without a unique row id. If that line is missing, SQLite generates:
    rowid        INTEGER PRIMARY KEY on its own. What you have is ok, but it generates an alias for the PRIMARY KEY - id is an alias for rowid.

    Update: Complications can occur with multiple writers, but I don't think that is an issue with your code.

    my $rowid = $sths{$table}{SELECT}->execute($value) || die;
    I think we covered how to get last rowid. You do not need "|| die" because you have RaiseError =1 which is what I recommend.

    Another fine point, 0E0 this is the DB's way of saying logical "true", but numeric "zero".

    $dbh->do("CREATE INDEX ix_song ON Songs (song)");
    I would leave the index off completely to start with and see how the performance is. You may be surprised at how well SQLite does on its own. The index adds considerably to the time for each insert. There are weird things that can happen in complex tables if you "overindex" them. This confuses the execution strategy engine and you can wind up with slower code! Anyway be careful.

Re: SQLite: INSERT into a unique column and retrieve rowid
by ibm1620 (Hermit) on May 04, 2024 at 16:14 UTC
    Using the above suggestions (thank you!), I rewrote the central subroutine as follows:
    sub insert_unique( $table, $column, $value ) { state %sths; if ( !exists $sths{$table} ) { $sths{$table}{SELECT} = $dbh->prepare("SELECT id FROM $table WHERE $column = (?)") +; $sths{$table}{INSERT} = $dbh->prepare("INSERT INTO $table ($column) VALUES (?) RET +URNING id"); } my $rowid; my $sth = $sths{$table}{SELECT}; $sth->execute($value); my $rows = $sth->fetchall_arrayref( {} ); if ($rows->@*) { $rowid = $rows->[0]->{id}; } else { $sth = $sths{$table}{INSERT}; $sth->execute($value); $rows = $sth->fetchall_arrayref( {} ); $rowid = $rows->[0]->{id}; } return $rowid; }
    Which works.

    It does seem to me a lot of code to accomplish a seemingly basic requirement. I'm implementing UNIQUEness, for one thing. But it works, and it's pretty fast adding 60,000 songs. (I didn't notice much difference whether or not the column was indexed.)

      It does seem to me a lot of code

      It could be streamlined for sure, particularly if you make the database enforce the uniqueness constraint (as in the OP) and just trap the exceptions. But really the worrying part is that $table and $column are SQL injections just waiting to happen.

      Also the fact that you've used a state hash to hold the prepared statement handles means that these variables only have relevance on the first invocation, which is an unusual situation. eg. call it once on table foo and then again on table bar and the second one will still use table foo. That's confusing at best. (edit: ignore this part)


      🦛

        Regarding SQL injection -- had to look it up. Is my use of variables $table and $column in those prepared SELECTs necessarily dangerous? I can see how they could be if an enduser were prompted for them, but that's not how I intend to populate them. It's for a specific application with specific table and column names that are hardcoded into the program. As I say, I'm very much a novice at SQL apps. Maybe there's more to this than I'm aware of ... ? Thanks.
        I don't understand your comment about the state hash. I copied this subroutine into my actual program, which calls insert_unique() repeatedly for six different tables, and it worked correctly.
      I don't know what "it's pretty fast adding 60,000 songs" means. I would add the line $dbh->begin_work; before the loop that puts the 60,000 things into the DB. And the line $dbh->comitt; after the loop is over. This will run all 60,000 inserts as a single transaction. There should be a very noticeable decrease in execution time.
        I thought you were saying that the addition of an index to the table would dramatically increase the time it took to load the table. Without begin_work / commit, it took 23 seconds to load the table with an index, and 20 seconds without one. However, running it as a single transaction as you suggest, those numbers drop to 0.77 and 0.72.

        I haven't run any comparisons to see how much the index benefits retrieval, however...