Here is a bash program that loads your sample data into a table 'data_locks_keys' and then from that table extracts the 'single' values of column 'key' (with the first associated value of 'source') into a derived table 'data_locks_keys_distinct'.
Takes 1m 30s on 9.5dev postgres (low-end desktop).
#!/bin/sh # wget http://white-barn.com/tmp/data_locks_keys.zip time unzip -p data_locks_keys.zip \ | psql -c " drop table if exists data_locks_keys; create table data_locks_keys(key text, source text); copy data_locks_keys from stdin with ( format csv, delimiter E'|', header FALSE ); " # main table data_locks_keys now has 9,197,129 rows echo " create table data_locks_keys_distinct as select distinct on (key) key, source from data_locks_keys ; " | psql # derived table data_locks_keys_distinct has 3,692,089 rows (with colu +mn 'key' now unique)
I'm really getting curious whether this DISTINCT ON statement isn't what you are looking for...
In reply to Re: Contemplating some set comparison tasks
by erix
in thread Contemplating some set comparison tasks
by dwhite20899
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |