Re: (OT)Count the Column values if it same in the column

Replies are listed 'Best First'.
Re^2: (OT)Count the Column values if it same in the column by Util (Priest) on May 29, 2007 at 14:51 UTC
++moritz for the idea of succinctly expressing this in SQL. (I will address it in Perl in a later post). Perhaps I read the OP differently, but I think that this SQL, as given, will not produce the desired results. Here is my test setup, using SQLite (tested on Win32 and Linux): sqlite3 PM_617794.db DROP TABLE IF EXISTS pdbs; CREATE TABLE pdbs( PdbId TEXT, MetalAtm TEXT, MetalName TEXT, MetalNumber INTEGER, MetalChn TEXT, AtmName TEXT, ResName TEXT, ResNumber INTEGER, ResChn TEXT, Distance NUMERIC ); INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1401,'-',3 +.065); INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1402,'-',2 +.660); INSERT INTO pdbs VALUES('1pex','CA','CA', 503,'-','CL','CL', 501,'-',2 +.895); INSERT INTO pdbs VALUES('1qhu','NA','NA', 3,'-','CL','CL', 1,'-',3 +.096); INSERT INTO pdbs VALUES('1qjs','NA','NA', 513,'A','CL','CL', 511,'A',3 +.096); INSERT INTO pdbs VALUES('1rtg','CA','CA', 2,'-','CL','CL', 1,'-',3 +.079); INSERT INTO pdbs VALUES('1xj6','NA','NA',5002,'-','CL','CL',5001,'-',2 +.628); INSERT INTO pdbs VALUES('2caj','NI','NI',1151,'B','CL','CL',1150,'B',3 +.551); INSERT INTO pdbs VALUES('2oa9','CD','CD', 911,'-','CL','CL', 800,'-',2 +.291); INSERT INTO pdbs VALUES('2oa9','CD','CD', 913,'-','CL','CL', 801,'-',2 +.403); .mode column pdbs .width 4 2 2 4 1 2 2 4 1 5 [download] Here is your query, with the FROM clause added, and the $columns resolved to '': `SELECT , COUNT(PdbId) as c FROM pdbs GROUP BY PdbId HAVING c >= 2 ;` [download] Output: `1nc7 MG MG 1501 - CL CL 1402 - 2.66 2 2oa9 CD CD 913 - CL CL 801 - 2.403 2` [download] There are two problems here: Any GROUP BY clause can produce only one record per unique grouped-by field. The OP needs all the records which have groupings of more than one record. To achieve this, you must either use a sub-Select, or two Selects with a temp table storing the results of the first Select for use by the second Select. I don't see how it can be done with a single simple Select. Your SQL would produce a record if a PdbId had two records of any values of ResName. The sample data does not happen to have this scenario, but the code should not depend on that. The issue can be resolved by filtering with a WHERE clause; WHERE ResName = 'CL', or (more maintainably and flexibly) by adding ResName to the GROUP BY clause, and also filtering with WHERE. The OP could be read as using CL only as a representative of matching values; if so, then the WHERE is not needed, but the addition to GROUP BY becomes mandatory. Here is your SQL, modified as 1a and 2b above: `SELECT pdbs.* FROM pdbs JOIN ( SELECT PdbId, ResName, COUNT(*) as c FROM pdbs GROUP BY PdbId, ResName HAVING c >= 2 ) AS counter ON pdbs.PdbId = counter.PdbId AND pdbs.ResName = counter.ResName WHERE pdbs.ResName = 'CL' ;` [download] Outputs: `1nc7 MG MG 1501 - CL CL 1401 - 3.065 1nc7 MG MG 1501 - CL CL 1402 - 2.66 2oa9 CD CD 911 - CL CL 800 - 2.291 2oa9 CD CD 913 - CL CL 801 - 2.403` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: (OT)Count the Column values if it same in the column
by Util (Priest) on May 29, 2007 at 14:51 UTC

++moritz for the idea of succinctly expressing this in SQL. (I will address it in Perl in a later post).

Perhaps I read the OP differently, but I think that this SQL, as given, will not produce the desired results.

Here is my test setup, using SQLite (tested on Win32 and Linux):

sqlite3 PM_617794.db
DROP TABLE IF EXISTS pdbs;
CREATE TABLE pdbs(
  PdbId       TEXT,
  MetalAtm    TEXT,
  MetalName   TEXT,
  MetalNumber INTEGER,
  MetalChn    TEXT,
  AtmName     TEXT,
  ResName     TEXT,
  ResNumber   INTEGER,
  ResChn      TEXT,
  Distance    NUMERIC
);
INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1401,'-',3
+.065);
INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1402,'-',2
+.660);
INSERT INTO pdbs VALUES('1pex','CA','CA', 503,'-','CL','CL', 501,'-',2
+.895);
INSERT INTO pdbs VALUES('1qhu','NA','NA',   3,'-','CL','CL',   1,'-',3
+.096);
INSERT INTO pdbs VALUES('1qjs','NA','NA', 513,'A','CL','CL', 511,'A',3
+.096);
INSERT INTO pdbs VALUES('1rtg','CA','CA',   2,'-','CL','CL',   1,'-',3
+.079);
INSERT INTO pdbs VALUES('1xj6','NA','NA',5002,'-','CL','CL',5001,'-',2
+.628);
INSERT INTO pdbs VALUES('2caj','NI','NI',1151,'B','CL','CL',1150,'B',3
+.551);
INSERT INTO pdbs VALUES('2oa9','CD','CD', 911,'-','CL','CL', 800,'-',2
+.291);
INSERT INTO pdbs VALUES('2oa9','CD','CD', 913,'-','CL','CL', 801,'-',2
+.403);
.mode column pdbs
.width 4 2 2 4 1 2 2 4 1 5
[download]

Here is your query, with the FROM clause added, and the $columns resolved to '*':

SELECT   *, COUNT(PdbId) as c
FROM     pdbs
GROUP BY PdbId
HAVING   c >= 2
;
[download]

1nc7  MG  MG  1501  -  CL  CL  1402  -  2.66   2
2oa9  CD  CD  913   -  CL  CL  801   -  2.403  2
[download]

There are two problems here:

Any GROUP BY clause can produce only *one* record per unique grouped-by field. The OP needs *all* the records which have groupings of more than one record. To achieve this, you must either use
1. a sub-Select, or
2. two Selects with a temp table storing the results of the first Select for use by the second Select.
I don't see how it can be done with a single simple Select.
Your SQL would produce a record if a PdbId had two records of *any* values of ResName. The sample data does not happen to have this scenario, but the code should not depend on that. The issue can be resolved by
1. filtering with a WHERE clause; WHERE ResName = 'CL', or
2. (more maintainably and flexibly) by adding ResName to the GROUP BY clause, and also filtering with WHERE.
The OP *could* be read as using CL only as a representative of matching values; if so, then the WHERE is not needed, but the addition to GROUP BY becomes mandatory.

Here is your SQL, modified as 1a and 2b above:

SELECT     pdbs.*
FROM       pdbs JOIN (
  SELECT   PdbId, ResName, COUNT(*) as c
  FROM     pdbs
  GROUP BY PdbId, ResName
  HAVING   c >= 2
) AS       counter
ON         pdbs.PdbId   = counter.PdbId
AND        pdbs.ResName = counter.ResName
WHERE      pdbs.ResName = 'CL'
;
[download]

1nc7  MG  MG  1501  -  CL  CL  1401  -  3.065
1nc7  MG  MG  1501  -  CL  CL  1402  -  2.66
2oa9  CD  CD  911   -  CL  CL  800   -  2.291
2oa9  CD  CD  913   -  CL  CL  801   -  2.403
[download]

[reply]
[d/l]
[select]