Re: [OT] SQL "on duplicate key" custom action
by Corion (Patriarch) on Oct 21, 2019 at 07:36 UTC
|
I think the "best" approach (if that is supported by your DBMS) are ON UPDATE triggers, or if that import is a one time thing only, creating a unique key across all columns:
create or replace function skip_insert() returns trigger language plpg
+sql as $$
begin
return null;
end $$;
create or replace function raise_exception() returns trigger language
+plpgsql as $$
begin
RAISE EXCEPTION 'UPDATE changing row values not allowed';
end $$;
CREATE TRIGGER mytrigger ON records INSTEAD OF UPDATE
WHEN old.PriKey = new.PriKey and old.Foo=new.Foo and old.Bar=new.B
+ar
FOR EACH ROW
EXECUTE PROCEDURE skip_insert()
CREATE TRIGGER mytrigger ON records INSTEAD OF UPDATE
WHEN old.PriKey = new.PriKey and (old.Foo!=new.Foo or old.Bar!=new
+.Bar)
FOR EACH ROW
EXECUTE PROCEDURE raise_exception()
... but this approach does not handle NULL values. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
CREATE FUNCTION skip_update() RETURNS trigger AS $$
BEGIN
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION fail_update() RETURNS trigger AS $$
BEGIN
RAISE EXCEPTION 'can''t modify row values for existing primary
+ key';
END;
$$ LANGUAGE plpgsql;
CREATE TABLE Dummy_Test (
Timestamp TIMESTAMP WITH TIME ZONE PRIMARY KEY,
Foo TEXT,
Bar TEXT
);
CREATE TRIGGER Dummy_Test_dupe BEFORE UPDATE ON Dummy_Test FOR EACH RO
+W
WHEN ( OLD.Timestamp IS NOT DISTINCT FROM NEW.Timestamp AND OLD.Fo
+o IS NOT DISTINCT FROM NEW.Foo AND OLD.Bar IS NOT DISTINCT FROM NEW.B
+ar )
EXECUTE FUNCTION skip_update();
CREATE TRIGGER Dummy_Test_modify BEFORE UPDATE ON Dummy_Test FOR EACH
+ROW
WHEN ( OLD.Foo IS DISTINCT FROM NEW.Foo OR OLD.Bar IS DISTINCT FRO
+M NEW.Bar )
EXECUTE FUNCTION fail_update();
INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES ('2020-01-02 12:34',
+'Hello','World') ON CONFLICT ON CONSTRAINT Dummy_Test_pkey DO UPDATE
+SET Foo=EXCLUDED.Foo, Bar=EXCLUDED.Bar;
INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES ('2020-01-02 12:34',
+'Hello','World') ON CONFLICT ON CONSTRAINT Dummy_Test_pkey DO UPDATE
+SET Foo=EXCLUDED.Foo, Bar=EXCLUDED.Bar;
INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES ('2020-01-02 12:34',
+'Hello','abcde') ON CONFLICT ON CONSTRAINT Dummy_Test_pkey DO UPDATE
+SET Foo=EXCLUDED.Foo, Bar=EXCLUDED.Bar;
The last statement will fail, which is exactly what I wanted. The function FUNCTION skip_update() and TRIGGER Dummy_Test_dupe really only prevent the UPDATE from happening, they can be omitted - I haven't yet tested whether it's more performant to have the TRIGGER or the UPDATE fire. In any case, thank you very much! | [reply] [Watch: Dir/Any] [d/l] [select] |
|
use warnings;
use strict;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=:memory:", undef, undef,
{ RaiseError=>1, AutoCommit=>1 } );
$dbh->do(<<'ENDSQL');
CREATE TABLE Dummy_Test (
Timestamp INTEGER PRIMARY KEY,
Foo TEXT,
Bar TEXT
);
ENDSQL
$dbh->do(<<'ENDSQL');
CREATE TRIGGER Dummy_Test_dupe BEFORE UPDATE ON Dummy_Test FOR EACH RO
+W
WHEN ( OLD.Timestamp IS NEW.Timestamp AND OLD.Foo IS NEW.Foo AND O
+LD.Bar IS NEW.Bar )
BEGIN SELECT RAISE(IGNORE); END;
ENDSQL
$dbh->do(<<'ENDSQL');
CREATE TRIGGER Dummy_Test_modify BEFORE UPDATE ON Dummy_Test FOR EACH
+ROW
WHEN ( OLD.Foo IS NOT NEW.Foo OR OLD.Bar IS NOT NEW.Bar )
BEGIN SELECT RAISE(ABORT, "same Timestamp but different values");
+END;
ENDSQL
my $in = $dbh->prepare(<<'ENDSQL');
INSERT INTO Dummy_Test (Timestamp,Foo,Bar) VALUES (?,?,?)
ON CONFLICT (Timestamp) DO UPDATE SET Foo=EXCLUDED.Foo, Bar=EXCLUD
+ED.Bar;
ENDSQL
$in->execute(12345,'Hello','World');
$in->execute(12345,'Hello','World');
my $e;
eval { $in->execute(12345,'Hello','abcde'); 1 } or do { $e = $@ };
if ( defined $e ) { print "Third insert failed as expected: $e" }
else { die "Third insert didn't fail as expected" }
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
The function FUNCTION skip_update() and TRIGGER Dummy_Test_dupe really only prevent the UPDATE from happening, they can be omitted - I haven't yet tested whether it's more performant to have the TRIGGER or the UPDATE fire.
Interestingly, it made a small but consistent difference. I imported the same dataset (73613 INSERTs) three times into a previously populated database, and without the TRIGGER, i.e. with the UPDATE, it took an average 91.540s, and with the TRIGGER, i.e. preventing the UPDATE, it took 99.368s (each of these is only +/- about 2s), i.e. a difference of roughly 0.1ms per record. Not a big deal here, and not a rigorous test, but still interesting.
Update: Importing into a previously empty database revealed only a tiny difference: overall the import was roughly 10sec faster, again UPDATE beating TRIGGER with a difference of 0.17ms per record. Again, not really significant.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Do you perhaps have a tested version? I can't get this to run.
The trigger functions are all right, it seems, but I don't see how to create the triggers themselves onto the table 'records' so that they react in the desired way. (For one thing, you can't do an 'INSTEAD OF UPDATE' on a table, and when you use a BEFORE INSERT or an AFTER INSERT trigger, this old. and new. business is not going to work.)
| [reply] [Watch: Dir/Any] |
|
No, sorry - I don't have a tested version, and I'm not on the machine where I did the searches.
I worked from the Postgres 12 documentation and some Stackoverflow response, but I don't find them now :( I think it should be a before update trigger and the statement would be insert or update maybe. But again, I didn't test any of this, sorry.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
|
An index across the complete table will eat up lots of disk space, and also will make each insert slow(er). Think of an index as (somewhat structured) hash keys, where you can also quickly look up strings starting with a specific string.
If your table is largely read-only and you have the disk space to spare, the index will speed up queries maybe a bit more than an index only on the primary keys, because the DB can satisfy the query completely from the index without hitting the table storage.
The triggers will slow down all UPDATE statements but leave SELECT untouched and also will not use additional storage.
| [reply] [Watch: Dir/Any] |
|
Re: [OT] SQL "on duplicate key" custom action
by daxim (Curate) on Oct 21, 2019 at 09:30 UTC
|
MySQL/MariaDB has always been a pain in the butt:
… and still is. I've recently lost a lot of time with MariaDB 10.3 due to:
- DDL statements not running in transactions, leaving the DB in an inconsistent state
- json_object returning a string, not a nested data structure, so it won't compose
- the json type being silently downgraded to longtext
Would never recommend. Use PostgreSQL and save your sanity. | [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: [OT] SQL "on duplicate key" custom action
by erix (Prior) on Oct 21, 2019 at 09:43 UTC
|
drop table if exists testtable;
create table testtable (pk int primary key, foo text, bar te
+xt);
create unique index foobar_idx on testtable (pk, foo, bar);
INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C
+ONFLICT (pk,foo,bar) DO NOTHING;
INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C
+ONFLICT (pk,foo,bar) DO NOTHING;
INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'Quz' ) ON C
+ONFLICT (pk,foo,bar) DO NOTHING;
which yields:
drop table if exists testtable;
DROP TABLE
create table testtable (pk int primary key, foo text, bar te
+xt);
CREATE TABLE
create unique index foobar_idx on testtable (pk, foo, bar);
CREATE INDEX
INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C
+ONFLICT (pk,foo,bar) DO NOTHING;
INSERT 0 1
INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C
+ONFLICT (pk,foo,bar) DO NOTHING;
INSERT 0 0
INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'Quz' ) ON C
+ONFLICT (pk,foo,bar) DO NOTHING;
ERROR: duplicate key value violates unique constraint "testtable_pkey
+"
DETAIL: Key (pk)=(1) already exists.
Postgres 12.1 (but this works from 9.5.x onward)
Manual PostgreSQL 12.1: INSERT # SQL-ON-CONFLICT
update: The first INSERT should be the same, of course, fixed. Also, removed vestiges of the original bash-version.
update 2: Re 'MySQL vs. Postgres': I think open-sourced and completely-free PostgreSQL is a better/safer choice than proprietary Oracle MySQL. For one thing, over time Pg will get better -- even better than Oracle -- but how could Oracle ever allow cheapo MySQL to overtake its venerable and expensive Oracle database product?
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thank you very much for your suggestion; it turns out that some of my tables have a lot of columns, and I ran into "ERROR: cannot use more than 32 columns in an index". Since Corion's suggestion should use less disk space I think, I'll be going with my version of that.
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
I tried to get that trigger-version working but I could not (well, not in a satisfactory way). I'm beginning to think it's not possible -- so if you hit upon a trigger-solution, I'd very much like to know. Thanks!
| [reply] [Watch: Dir/Any] |
Re: [OT] SQL "on duplicate key" custom action
by cavac (Parson) on Oct 21, 2019 at 15:12 UTC
|
If you have the choice, choose PostgreSQL. It might not matter in the beginning, but projects tend to grow in complexity and PostgreSQL is a much better fit when it comes to complex data handling tasks and keeping your data integrity intact.
Also, if you like PerlMonks, you will most likely enjoy working with the PostgreSQL community.
perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
There are a lot of places to look at. https://www.postgresql.org/community/ is as good a place to start as any. Especially the mailing lists (yup, those things still exist) are very helpful. The mailing lists are pretty much the equivalent of PerlMonks in my opinion
If you need help semi-urgent or just want to hang out, #postgresql on irc.freenode.net is also a good place. Saved my bacon on a couple of occasions.
Last, but not least, if you need commercial PostgreSQL help for a big project, PostgreSQL got you covered: https://www.postgresql.org/support/professional_support/
perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
| [reply] [Watch: Dir/Any] [d/l] |
|
There's IRC: the #postgresql channel on libera.chat, if you like that kind of thing. It's especially like latterday perlmonks in that there are sometimes long silences, but then suddenly also intensely interesting conversations, use cases, problem cases -- and, even when silent, there is always much expertise.
[1] https://www.postgresql.org/community/irc/
| [reply] [Watch: Dir/Any] |
|
Q> What's the Postgres equivalent to PerlMonks?
A> EXPLAIN
| [reply] [Watch: Dir/Any] |
Re: [OT] SQL "on duplicate key" custom action
by perlfan (Vicar) on Oct 13, 2021 at 01:02 UTC
|
It depends on what DB, but it's basically the following to "ignore" it,
INSERT
INTO records (PriKey, Foo, Bar)
VALUES (1, 'Hello', World')
ON DUPLICATE KEY
UPDATE Foo=Foo;
Some track the attempts, though; using the equivalent to a column named Attempts. But I am not sure how useful that is in practice tbh:
INSERT
INTO records (PriKey, Foo, Bar)
VALUES (1, 'Hello', World')
ON DUPLICATE KEY
UPDATE Attempts=Attempts+1;
> Since I can choose the DB, what is your opinion on MySQL vs. Postgres?
It really depends; frankly I don't do anything these days that even requires a DB server so usually I go with DBD::SQLite. That has it's own things to deal with (forks, etc) but it makes my life a lot easier not to have to maintain DB servers and all that stuff. YMMV based on the application though.
Most relevant from the YMMV link,
SQLite does not compete with client/server databases. SQLite competes with fopen(). | [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] [d/l] |