in reply to Re^2: return primary key if duplicate entry exists?
in thread return primary key if duplicate entry exists?

actually couldn't I allow it to replace the duplicate with the duplicate and that way last_insert_id will return that row id for me?
  • Comment on Re^3: return primary key if duplicate entry exists?

Replies are listed 'Best First'.
Re^4: return primary key if duplicate entry exists?
by poj (Abbot) on Jan 24, 2016 at 16:07 UTC

    Here's a working example. It does return the id either of the new record or the existing record. Unfortunately it also appears to change the auto-increment value with every update (even when no new records are inserted) so id's may not be continuous.

    From the mysql docs :  In general, you should try to avoid using an ON DUPLICATE KEY UPDATE clause on tables with multiple unique indexes.

    #!\C:\Strawberry\perl\bin\perl use strict; use warnings; use DBI; # CREATE TABLE test2 ( # id int(11) NOT NULL AUTO_INCREMENT, # f1 varchar(45) DEFAULT NULL, # PRIMARY KEY (id), # UNIQUE KEY f1_UNIQUE (f1) # ) ENGINE=InnoDB AUTO_INCREMENT=27 DEFAULT CHARSET=utf8; my $table = 'test2'; my $col = 'f1'; # unique key my $entry = $ARGV[0] || 1; #PERL DBI CONNECT my $dbh = dbh(); my $sql = sprintf ' INSERT INTO %s (%s) VALUES (?) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id)', $dbh->quote_identifier($table), $dbh->quote_identifier($col) ; my $sth = $dbh->prepare($sql); my $rv = $sth->execute($entry); print $rv.' '.$dbh->last_insert_id(undef,undef,$table,$col); sub dbh{ my $database = "test"; my $user = "user"; my $pw = "password"; my $dsn = "dbi:mysql:$database:localhost:3306"; my $dbh = DBI->connect($dsn, $user, $pw, { RaiseError=>1, AutoCommit=>1 } ); return $dbh; }
    poj
      What is the use of 'UPDATE id = LAST_INSERT_ID(id)'?

      According to the MySQL docs LAST_INSERT_ID(id) will return the value of the argument, so you are actually replacing the existing value in the field 'id' by itself and then you are using that value as the next auto-incremented key value. That is likely to create even more errors as now the database doesn't know anymore where it was in the sequence it was using and will happily re-use an already existing number.

      if expr is given as an argument to LAST_INSERT_ID(), the value of the argument is returned by the function and is remembered as the next value to be returned by LAST_INSERT_ID().

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics
        After thinking a little bit. I think instead of the basic auto-increment key I will switch to using crc32. I realized this will make things easier as essentially the key is embedded in the data, and I will not need to do a SELECT..WHERE query.
      thanks! actually I didn't realize you can use placeholders with quote identifiers. For some reason I thought they had to be used separately.

      So I didn't fully comprehend my own code as some of it was borrowed. If I understand the code correctly, id will be equal to the new auto-increment id, which also means I can call that id from last_insert_id()? Also if it wasn't a duplicate, the last_insert_id() would also have the new key? So basically, with this I can always rely on last_insert_id() to give me the key?
Re^4: return primary key if duplicate entry exists?
by CountZero (Bishop) on Jan 24, 2016 at 18:45 UTC
    Actually, if you perform an INSERT which results in a duplicate key error, then by definition you must know the key already. How else could you have inserted that record otherwise?

    The last_insert_id is only useful with auto-incrementing keys (and even then I find it rarely needed), but in any case auto-incrementing keys should never give a duplicate key error.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      that is a good point, and probably a good reason to use unique values as primary keys. I guess I am slightly confused on database design then. I am working with DNA sequences and they can be quite long strings. So I am using a crc32 to check for uniqueness. Would it still be a good idea to use that as a primary key? I merely assumed that was not an ideal key?
        CRC32 should only be used as a checksum to verify integrity of the data, but not as a key since there is no guarantee whatsoever that it will be unique for each different input and the result is only 32 bits long (4 bytes).

        What you need is a message digest. Have a look at Digest::SHA1. The digest function will return a 20 byte binary or 40 byte hexadecimal result that still isn't guaranteed to be unique for each different input but given that its result is now 160 bits long, the risk of a collision (i.e. the same digest value for a different input) is much smaller. Anyhow, if two DNA sequences have a different digest value, they are guaranteed to be different. If two sequences have the same digest value, they can still be different (this is called "collision") and you should check the full DNA sequence to make sure they are different or not.

        But as even the SHA1 digest does not guarantee "uniqueness" it cannot be used as a key in your database. In such cases, you should think of an auto-incrementing primary key and save both the full DNA sequence and its digest in the database. The digest can be used as an index to quickly check if the full DNA sequence is unique or already known and stored in the database. If you find a duplicate digest value then you must check the full DNA sequence to make sure it is not a (rare) collision case.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics