editi has asked for the wisdom of the Perl Monks concerning the following question:

Hai,
I have a table like,
+-------+----------+-----------+-------------+----------+---------+--- +------+-----------+--------+----------+ | PdbId | MetalAtm | MetalName | MetalNumber | MetalChn | AtmName | Re +sName | ResNumber | ResChn | Distance | +-------+----------+-----------+-------------+----------+---------+--- +------+-----------+--------+----------+ | 1nc7 | MG | MG | 1501 | - | CL | CL | 1401 | - | 3.065 | | 1nc7 | MG | MG | 1501 | - | CL | CL | 1402 | - | 2.660 | | 1pex | CA | CA | 503 | - | CL | CL | 501 | - | 2.895 | | 1qhu | NA | NA | 3 | - | CL | CL | 1 | - | 3.096 | | 1qjs | NA | NA | 513 | A | CL | CL | 511 | A | 3.096 | | 1rtg | CA | CA | 2 | - | CL | CL | 1 | - | 3.079 | | 1xj6 | NA | NA | 5002 | - | CL | CL | 5001 | - | 2.628 | | 2caj | NI | NI | 1151 | B | CL | CL | 1150 | B | 3.551 | | 2oa9 | CD | CD | 911 | - | CL | CL | 800 | - | 2.291 | | 2oa9 | CD | CD | 913 | - | CL | CL | 801 | - | 2.403 | +-------+----------+-----------+-------------+----------+---------+--- +------+-----------+--------+----------+

I want to retrieve the rows only the count must have of single PdbId with RESNAME=CL should >=2 that means i want to retreive only

+-------+----------+-----------+-------------+----------+---------+--- +------+-----------+--------+----------+ | PdbId | MetalAtm | MetalName | MetalNumber | MetalChn | AtmName | Re +sName | ResNumber | ResChn | Distance | +-------+----------+-----------+-------------+----------+---------+--- +------+-----------+--------+----------+ | 1nc7 | MG | MG | 1501 | - | CL | CL | 1401 | - | 3.065 | | 1nc7 | MG | MG | 1501 | - | CL | CL | 1402 | - | 2.660 | | 2oa9 | CD | CD | 911 | - | CL | CL | 800 | - | 2.291 | | 2oa9 | CD | CD | 913 | - | CL | CL | 801 | - | 2.403 | +-------+----------+-----------+-------------+----------+---------+--- +------+-----------+--------+----------+

thanks in advance

Replies are listed 'Best First'.
Re: (OT)Count the Column values if it same in the column
by moritz (Cardinal) on May 28, 2007 at 09:48 UTC
    You can achieve that by using
    SELECT $columns, COUNT(PdbID) as c GROUP BY PdbId HAVING c >= 2
    (this is mysql syntax and untetsted, but it should be easy to apply to other systems)

      ++moritz for the idea of succinctly expressing this in SQL. (I will address it in Perl in a later post).

      Perhaps I read the OP differently, but I think that this SQL, as given, will not produce the desired results.

      Here is my test setup, using SQLite (tested on Win32 and Linux):

      sqlite3 PM_617794.db DROP TABLE IF EXISTS pdbs; CREATE TABLE pdbs( PdbId TEXT, MetalAtm TEXT, MetalName TEXT, MetalNumber INTEGER, MetalChn TEXT, AtmName TEXT, ResName TEXT, ResNumber INTEGER, ResChn TEXT, Distance NUMERIC ); INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1401,'-',3 +.065); INSERT INTO pdbs VALUES('1nc7','MG','MG',1501,'-','CL','CL',1402,'-',2 +.660); INSERT INTO pdbs VALUES('1pex','CA','CA', 503,'-','CL','CL', 501,'-',2 +.895); INSERT INTO pdbs VALUES('1qhu','NA','NA', 3,'-','CL','CL', 1,'-',3 +.096); INSERT INTO pdbs VALUES('1qjs','NA','NA', 513,'A','CL','CL', 511,'A',3 +.096); INSERT INTO pdbs VALUES('1rtg','CA','CA', 2,'-','CL','CL', 1,'-',3 +.079); INSERT INTO pdbs VALUES('1xj6','NA','NA',5002,'-','CL','CL',5001,'-',2 +.628); INSERT INTO pdbs VALUES('2caj','NI','NI',1151,'B','CL','CL',1150,'B',3 +.551); INSERT INTO pdbs VALUES('2oa9','CD','CD', 911,'-','CL','CL', 800,'-',2 +.291); INSERT INTO pdbs VALUES('2oa9','CD','CD', 913,'-','CL','CL', 801,'-',2 +.403); .mode column pdbs .width 4 2 2 4 1 2 2 4 1 5

      Here is your query, with the FROM clause added, and the $columns resolved to '*':

      SELECT *, COUNT(PdbId) as c FROM pdbs GROUP BY PdbId HAVING c >= 2 ;
      Output:
      1nc7 MG MG 1501 - CL CL 1402 - 2.66 2 2oa9 CD CD 913 - CL CL 801 - 2.403 2

      There are two problems here:

      1. Any GROUP BY clause can produce only *one* record per unique grouped-by field. The OP needs *all* the records which have groupings of more than one record. To achieve this, you must either use
        1. a sub-Select, or
        2. two Selects with a temp table storing the results of the first Select for use by the second Select.
        I don't see how it can be done with a single simple Select.
      2. Your SQL would produce a record if a PdbId had two records of *any* values of ResName. The sample data does not happen to have this scenario, but the code should not depend on that. The issue can be resolved by
        1. filtering with a WHERE clause; WHERE ResName = 'CL', or
        2. (more maintainably and flexibly) by adding ResName to the GROUP BY clause, and also filtering with WHERE.
        The OP *could* be read as using CL only as a representative of matching values; if so, then the WHERE is not needed, but the addition to GROUP BY becomes mandatory.

      Here is your SQL, modified as 1a and 2b above:

      SELECT pdbs.* FROM pdbs JOIN ( SELECT PdbId, ResName, COUNT(*) as c FROM pdbs GROUP BY PdbId, ResName HAVING c >= 2 ) AS counter ON pdbs.PdbId = counter.PdbId AND pdbs.ResName = counter.ResName WHERE pdbs.ResName = 'CL' ;
      Outputs:
      1nc7 MG MG 1501 - CL CL 1401 - 3.065 1nc7 MG MG 1501 - CL CL 1402 - 2.66 2oa9 CD CD 911 - CL CL 800 - 2.291 2oa9 CD CD 913 - CL CL 801 - 2.403

Re: (OT)Count the Column values if it same in the column
by Util (Priest) on May 29, 2007 at 14:51 UTC

    Here is a solution in Perl.

    Working, tested code:

    use strict; use warnings; my @field_names = qw( PdbId MetalAtm MetalName MetalNumber MetalChn AtmName ResName ResNumber ResChn Distance ); # Trick for concise sample data. sub aref_to_href { my @field_values = @_; die unless @field_names == @field_values; my %h; @h{@field_names} = @field_values; \%h; } my @records_AoH = map { aref_to_href( @{$_} ) } ( [ qw( 1nc7 MG MG 1501 - CL CL 1401 - 3.065 ) ], [ qw( 1nc7 MG MG 1501 - CL CL 1402 - 2.660 ) ], [ qw( 1pex CA CA 503 - CL CL 501 - 2.895 ) ], [ qw( 1qhu NA NA 3 - CL CL 1 - 3.096 ) ], [ qw( 1qjs NA NA 513 A CL CL 511 A 3.096 ) ], [ qw( 1rtg CA CA 2 - CL CL 1 - 3.079 ) ], [ qw( 1xj6 NA NA 5002 - CL CL 5001 - 2.628 ) ], [ qw( 2caj NI NI 1151 B CL CL 1150 B 3.551 ) ], [ qw( 2oa9 CD CD 911 - CL CL 800 - 2.291 ) ], [ qw( 2oa9 CD CD 913 - CL CL 801 - 2.403 ) ], # Added for testing different ResName columns [ qw( 99zz CD CD 911 - CL YY 800 - 2.291 ) ], [ qw( 99zz CD CD 913 - CL YY 801 - 2.403 ) ], ); my %count; $count{ $_->{PdbId} }{ $_->{ResName} }++ foreach @records_AoH; # Uncomment the second `grep` line if needed. # (See the SQL discussion in the previous node). my @filtered_AoH = grep { $count{$_->{PdbId}}{$_->{ResName}} >= 2 } # grep { $_->{ResName} eq 'CL' } @records_AoH; my $template = '%5s %8s %9s %11s %8s %7s %7s %9s %6s %8s'; printf "$template\n", @field_names; foreach (@filtered_AoH) { printf "$template\n", @{ $_ }{@field_names}; }
    Output:
    PdbId MetalAtm MetalName MetalNumber MetalChn AtmName ResName ResNumbe +r ResChn Distance 1nc7 MG MG 1501 - CL CL 140 +1 - 3.065 1nc7 MG MG 1501 - CL CL 140 +2 - 2.660 2oa9 CD CD 911 - CL CL 80 +0 - 2.291 2oa9 CD CD 913 - CL CL 80 +1 - 2.403 99zz CD CD 911 - CL YY 80 +0 - 2.291 99zz CD CD 913 - CL YY 80 +1 - 2.403
    Output with the second `grep` uncommented:
    PdbId MetalAtm MetalName MetalNumber MetalChn AtmName ResName ResNumbe +r ResChn Distance 1nc7 MG MG 1501 - CL CL 140 +1 - 3.065 1nc7 MG MG 1501 - CL CL 140 +2 - 2.660 2oa9 CD CD 911 - CL CL 80 +0 - 2.291 2oa9 CD CD 913 - CL CL 80 +1 - 2.403