(OT)Count the Column values if it same in the column

editi has asked for the wisdom of the Perl Monks concerning the following question:

Hai,
I have a table like,

+-------+----------+-----------+-------------+----------+---------+---
+------+-----------+--------+----------+
| PdbId | MetalAtm | MetalName | MetalNumber | MetalChn | AtmName | Re
+sName | ResNumber | ResChn | Distance |
+-------+----------+-----------+-------------+----------+---------+---
+------+-----------+--------+----------+
| 1nc7 | MG | MG | 1501 | - | CL | CL | 1401 | - | 3.065 |
| 1nc7 | MG | MG | 1501 | - | CL | CL | 1402 | - | 2.660 |
| 1pex | CA | CA | 503 | - | CL | CL | 501 | - | 2.895 |
| 1qhu | NA | NA | 3 | - | CL | CL | 1 | - | 3.096 |
| 1qjs | NA | NA | 513 | A | CL | CL | 511 | A | 3.096 |
| 1rtg | CA | CA | 2 | - | CL | CL | 1 | - | 3.079 |
| 1xj6 | NA | NA | 5002 | - | CL | CL | 5001 | - | 2.628 |
| 2caj | NI | NI | 1151 | B | CL | CL | 1150 | B | 3.551 |
| 2oa9 | CD | CD | 911 | - | CL | CL | 800 | - | 2.291 |
| 2oa9 | CD | CD | 913 | - | CL | CL | 801 | - | 2.403 |
+-------+----------+-----------+-------------+----------+---------+---
+------+-----------+--------+----------+
[download]

I want to retrieve the rows only the count must have of single PdbId with RESNAME=CL should >=2 that means i want to retreive only

+-------+----------+-----------+-------------+----------+---------+---
+------+-----------+--------+----------+
| PdbId | MetalAtm | MetalName | MetalNumber | MetalChn | AtmName | Re
+sName | ResNumber | ResChn | Distance |
+-------+----------+-----------+-------------+----------+---------+---
+------+-----------+--------+----------+
| 1nc7 | MG | MG | 1501 | - | CL | CL | 1401 | - | 3.065 |
| 1nc7 | MG | MG | 1501 | - | CL | CL | 1402 | - | 2.660 |
| 2oa9 | CD | CD | 911 | - | CL | CL | 800 | - | 2.291 |
| 2oa9 | CD | CD | 913 | - | CL | CL | 801 | - | 2.403 |
+-------+----------+-----------+-------------+----------+---------+---
+------+-----------+--------+----------+
[download]

thanks in advance

Comment on (OT)Count the Column values if it same in the column Select or Download Code

Replies are listed 'Best First'.
Re: (OT)Count the Column values if it same in the column by moritz (Cardinal) on May 28, 2007 at 09:48 UTC
You can achieve that by using `SELECT $columns, COUNT(PdbID) as c GROUP BY PdbId HAVING c >= 2` [download] (this is mysql syntax and untetsted, but it should be easy to apply to other systems) Perl 6 in German	[reply] [d/l]
Re^2: (OT)Count the Column values if it same in the column by Util (Priest) on May 29, 2007 at 14:51 UTC
++moritz for the idea of succinctly expressing this in SQL. (I will address it in Perl in a later post). Perhaps I read the OP differently, but I think that this SQL, as given, will not produce the desired results. Here is my test setup, using SQLite (tested on Win32 and Linux): sqlite3 PM_617794.db DROP TABLE IF EXISTS pdbs; CREATE TABLE pdbs( PdbId TEXT, MetalAtm TEXT, MetalName TEXT, MetalNumber INTEGER, MetalChn TEXT, AtmName TEXT, ResName TEXT, ResNumber INTEGER, ResChn TEXT, Distance NUMERIC ); INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1401,'-',3 +.065); INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1402,'-',2 +.660); INSERT INTO pdbs VALUES('1pex','CA','CA', 503,'-','CL','CL', 501,'-',2 +.895); INSERT INTO pdbs VALUES('1qhu','NA','NA', 3,'-','CL','CL', 1,'-',3 +.096); INSERT INTO pdbs VALUES('1qjs','NA','NA', 513,'A','CL','CL', 511,'A',3 +.096); INSERT INTO pdbs VALUES('1rtg','CA','CA', 2,'-','CL','CL', 1,'-',3 +.079); INSERT INTO pdbs VALUES('1xj6','NA','NA',5002,'-','CL','CL',5001,'-',2 +.628); INSERT INTO pdbs VALUES('2caj','NI','NI',1151,'B','CL','CL',1150,'B',3 +.551); INSERT INTO pdbs VALUES('2oa9','CD','CD', 911,'-','CL','CL', 800,'-',2 +.291); INSERT INTO pdbs VALUES('2oa9','CD','CD', 913,'-','CL','CL', 801,'-',2 +.403); .mode column pdbs .width 4 2 2 4 1 2 2 4 1 5 [download] Here is your query, with the FROM clause added, and the $columns resolved to '': `SELECT , COUNT(PdbId) as c FROM pdbs GROUP BY PdbId HAVING c >= 2 ;` [download] Output: `1nc7 MG MG 1501 - CL CL 1402 - 2.66 2 2oa9 CD CD 913 - CL CL 801 - 2.403 2` [download] There are two problems here: Any GROUP BY clause can produce only one record per unique grouped-by field. The OP needs all the records which have groupings of more than one record. To achieve this, you must either use a sub-Select, or two Selects with a temp table storing the results of the first Select for use by the second Select. I don't see how it can be done with a single simple Select. Your SQL would produce a record if a PdbId had two records of any values of ResName. The sample data does not happen to have this scenario, but the code should not depend on that. The issue can be resolved by filtering with a WHERE clause; WHERE ResName = 'CL', or (more maintainably and flexibly) by adding ResName to the GROUP BY clause, and also filtering with WHERE. The OP could be read as using CL only as a representative of matching values; if so, then the WHERE is not needed, but the addition to GROUP BY becomes mandatory. Here is your SQL, modified as 1a and 2b above: `SELECT pdbs.* FROM pdbs JOIN ( SELECT PdbId, ResName, COUNT(*) as c FROM pdbs GROUP BY PdbId, ResName HAVING c >= 2 ) AS counter ON pdbs.PdbId = counter.PdbId AND pdbs.ResName = counter.ResName WHERE pdbs.ResName = 'CL' ;` [download] Outputs: `1nc7 MG MG 1501 - CL CL 1401 - 3.065 1nc7 MG MG 1501 - CL CL 1402 - 2.66 2oa9 CD CD 911 - CL CL 800 - 2.291 2oa9 CD CD 913 - CL CL 801 - 2.403` [download]	[reply] [d/l] [select]
Re: (OT)Count the Column values if it same in the column by Util (Priest) on May 29, 2007 at 14:51 UTC
Here is a solution in Perl. Working, tested code: use strict; use warnings; my @field_names = qw( PdbId MetalAtm MetalName MetalNumber MetalChn AtmName ResName ResNumber ResChn Distance ); # Trick for concise sample data. sub aref_to_href { my @field_values = @_; die unless @field_names == @field_values; my %h; @h{@field_names} = @field_values; \%h; } my @records_AoH = map { aref_to_href( @{$_} ) } ( [ qw( 1nc7 MG MG 1501 - CL CL 1401 - 3.065 ) ], [ qw( 1nc7 MG MG 1501 - CL CL 1402 - 2.660 ) ], [ qw( 1pex CA CA 503 - CL CL 501 - 2.895 ) ], [ qw( 1qhu NA NA 3 - CL CL 1 - 3.096 ) ], [ qw( 1qjs NA NA 513 A CL CL 511 A 3.096 ) ], [ qw( 1rtg CA CA 2 - CL CL 1 - 3.079 ) ], [ qw( 1xj6 NA NA 5002 - CL CL 5001 - 2.628 ) ], [ qw( 2caj NI NI 1151 B CL CL 1150 B 3.551 ) ], [ qw( 2oa9 CD CD 911 - CL CL 800 - 2.291 ) ], [ qw( 2oa9 CD CD 913 - CL CL 801 - 2.403 ) ], # Added for testing different ResName columns [ qw( 99zz CD CD 911 - CL YY 800 - 2.291 ) ], [ qw( 99zz CD CD 913 - CL YY 801 - 2.403 ) ], ); my %count; $count{ $_->{PdbId} }{ $_->{ResName} }++ foreach @records_AoH; # Uncomment the second `grep` line if needed. # (See the SQL discussion in the previous node). my @filtered_AoH = grep { $count{$_->{PdbId}}{$_->{ResName}} >= 2 } # grep { $_->{ResName} eq 'CL' } @records_AoH; my $template = '%5s %8s %9s %11s %8s %7s %7s %9s %6s %8s'; printf "$template\n", @field_names; foreach (@filtered_AoH) { printf "$template\n", @{ $_ }{@field_names}; } [download] Output: `PdbId MetalAtm MetalName MetalNumber MetalChn AtmName ResName ResNumbe +r ResChn Distance 1nc7 MG MG 1501 - CL CL 140 +1 - 3.065 1nc7 MG MG 1501 - CL CL 140 +2 - 2.660 2oa9 CD CD 911 - CL CL 80 +0 - 2.291 2oa9 CD CD 913 - CL CL 80 +1 - 2.403 99zz CD CD 911 - CL YY 80 +0 - 2.291 99zz CD CD 913 - CL YY 80 +1 - 2.403` [download] Output with the second `grep` uncommented: `PdbId MetalAtm MetalName MetalNumber MetalChn AtmName ResName ResNumbe +r ResChn Distance 1nc7 MG MG 1501 - CL CL 140 +1 - 3.065 1nc7 MG MG 1501 - CL CL 140 +2 - 2.660 2oa9 CD CD 911 - CL CL 80 +0 - 2.291 2oa9 CD CD 913 - CL CL 80 +1 - 2.403` [download]	[reply] [d/l] [select]