++moritz for the idea of succinctly expressing this in SQL. (I will address it in Perl in a later post).
Perhaps I read the OP differently, but I think that this SQL, as given, will not produce the desired results.
Here is my test setup, using SQLite (tested on Win32 and Linux):
sqlite3 PM_617794.db
DROP TABLE IF EXISTS pdbs;
CREATE TABLE pdbs(
PdbId TEXT,
MetalAtm TEXT,
MetalName TEXT,
MetalNumber INTEGER,
MetalChn TEXT,
AtmName TEXT,
ResName TEXT,
ResNumber INTEGER,
ResChn TEXT,
Distance NUMERIC
);
INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1401,'-',3
+.065);
INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1402,'-',2
+.660);
INSERT INTO pdbs VALUES('1pex','CA','CA', 503,'-','CL','CL', 501,'-',2
+.895);
INSERT INTO pdbs VALUES('1qhu','NA','NA', 3,'-','CL','CL', 1,'-',3
+.096);
INSERT INTO pdbs VALUES('1qjs','NA','NA', 513,'A','CL','CL', 511,'A',3
+.096);
INSERT INTO pdbs VALUES('1rtg','CA','CA', 2,'-','CL','CL', 1,'-',3
+.079);
INSERT INTO pdbs VALUES('1xj6','NA','NA',5002,'-','CL','CL',5001,'-',2
+.628);
INSERT INTO pdbs VALUES('2caj','NI','NI',1151,'B','CL','CL',1150,'B',3
+.551);
INSERT INTO pdbs VALUES('2oa9','CD','CD', 911,'-','CL','CL', 800,'-',2
+.291);
INSERT INTO pdbs VALUES('2oa9','CD','CD', 913,'-','CL','CL', 801,'-',2
+.403);
.mode column pdbs
.width 4 2 2 4 1 2 2 4 1 5
Here is your query, with the FROM clause added, and the $columns resolved to '*':
SELECT *, COUNT(PdbId) as c
FROM pdbs
GROUP BY PdbId
HAVING c >= 2
;
Output:
1nc7 MG MG 1501 - CL CL 1402 - 2.66 2
2oa9 CD CD 913 - CL CL 801 - 2.403 2
There are two problems here:
- Any GROUP BY clause can produce only *one* record per unique grouped-by field. The OP needs *all* the records which have groupings of more than one record. To achieve this, you must either use
- a sub-Select, or
- two Selects with a temp table storing the results of the first Select for use by the second Select.
I don't see how it can be done with a single simple Select.
- Your SQL would produce a record if a PdbId had two records of *any* values of ResName. The sample data does not happen to have this scenario, but the code should not depend on that.
The issue can be resolved by
- filtering with a WHERE clause; WHERE ResName = 'CL', or
- (more maintainably and flexibly) by adding ResName to the GROUP BY clause, and also filtering with WHERE.
The OP *could* be read as using CL only as a representative of matching values; if so, then the WHERE is not needed, but the addition to GROUP BY becomes mandatory.
Here is your SQL, modified as 1a and 2b above:
SELECT pdbs.*
FROM pdbs JOIN (
SELECT PdbId, ResName, COUNT(*) as c
FROM pdbs
GROUP BY PdbId, ResName
HAVING c >= 2
) AS counter
ON pdbs.PdbId = counter.PdbId
AND pdbs.ResName = counter.ResName
WHERE pdbs.ResName = 'CL'
;
Outputs:
1nc7 MG MG 1501 - CL CL 1401 - 3.065
1nc7 MG MG 1501 - CL CL 1402 - 2.66
2oa9 CD CD 911 - CL CL 800 - 2.291
2oa9 CD CD 913 - CL CL 801 - 2.403
|