MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Im stuck on a problem where I need to count the occurences of an Id in a hash. The Id is the value and the key is a string that contains a key word.

An example is:

KEY Value Affy:HG_U133A:213119_at:74:303; ENSG00000123643 Affy:HG_U133A:213119_at:542:439; ENSG00000123643 Affy:HG_U133A:213119_at:658:369; ENSG00000123643 Affy:HG_U133A:213119_at:199:255; ENSG00000123643 Affy:HG_U133A:213119_at:436:453; ENSG00000123643 Affy:HG_U133A:213119_at:324:381; ENSG00000458158 Affy:HG_U133A:213119_at:584:557; ENSG00000123643 Affy:HG_U133A:213119_at:234:507; ENSG00000123643 Affy:HG_U133A:213119_at:482:429; ENSG00000123643 Affy:HG_U133A:213119_at:608:451; ENSG00000458158 Affy:HG_U133A:213119_at:356:297; ENSG00000123643

What im after is to count the Ids for the the highlighted string Affy:HG_U133A:213119_at:356:297; , and then to choose the most reguarly occuring one. So this would give me:
213119_at : ENSG00000123643 = 9 (thank you wfsp) 213119_at : ENSG00000458158 = 2

I would then choose ENSG00000123643 to store in another hash with 213119_at, e.g. 213119_at{ENSG00000123643}.

I have looked at 100967 but its not what im after.
Does anybody have any ideas ?

cheers,
MonkPaul.

Replies are listed 'Best First'.
Re: Counting hash values
by wfsp (Abbot) on May 26, 2006 at 11:48 UTC
    Here's one way - construct another hash:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = map {split /\s+/} <DATA>; my %new_hash; for my $key (keys %hash){ if ($key =~ /213119_at/){ $new_hash{$hash{$key}}++; } } print Dumper \%new_hash; __DATA__ Affy:HG_U133A:213119_at:74:303; ENSG00000123643 Affy:HG_U133A:213119_at:542:439; ENSG00000123643 Affy:HG_U133A:213119_at:658:369; ENSG00000123643 Affy:HG_U133A:213119_at:199:255; ENSG00000123643 Affy:HG_U133A:213119_at:436:453; ENSG00000123643 Affy:HG_U133A:213119_at:324:381; ENSG00000458158 Affy:HG_U133A:213119_at:584:557; ENSG00000123643 Affy:HG_U133A:213119_at:234:507; ENSG00000123643 Affy:HG_U133A:213119_at:482:429; ENSG00000123643 Affy:HG_U133A:213119_at:608:451; ENSG00000458158 Affy:HG_U133A:213119_at:356:297; ENSG00000123643
    output:
    ---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" _new.pl $VAR1 = { 'ENSG00000123643' => 9, 'ENSG00000458158' => 2 }; > Terminated with exit code 0.
    note: there are 9 :-)
Re: Counting hash values
by GrandFather (Saint) on May 26, 2006 at 11:49 UTC

    This doesn't match your sample output,but it does match my understanding of your description.

    use strict; use warnings; my %oHash; my %count; my $best = 0; my $bestCode; while (<DATA>) { chomp; my ($code) = /\s+(.*)/; next if ++$count{$code} <= $best; $best = $count{$code}; $bestCode = $code; } print "Found $best occurances of $bestCode\n"; __DATA__ Affy:HG_U133A:213119_at:74:303; ENSG00000123643 Affy:HG_U133A:213119_at:542:439; ENSG00000123643 Affy:HG_U133A:213119_at:658:369; ENSG00000123643 Affy:HG_U133A:213119_at:199:255; ENSG00000123643 Affy:HG_U133A:213119_at:436:453; ENSG00000123643 Affy:HG_U133A:213119_at:324:381; ENSG00000458158 Affy:HG_U133A:213119_at:584:557; ENSG00000123643 Affy:HG_U133A:213119_at:234:507; ENSG00000123643 Affy:HG_U133A:213119_at:482:429; ENSG00000123643 Affy:HG_U133A:213119_at:608:451; ENSG00000458158 Affy:HG_U133A:213119_at:356:297; ENSG00000123643

    Prints:

    Found 9 occurances of ENSG00000123643

    DWIM is Perl's answer to Gödel
Re: Counting hash values
by xdg (Monsignor) on May 26, 2006 at 11:49 UTC

    Update: Just to clarify, the answer I give below assumes that you're interested in a general solution, not just the case where "213119_at" is the only value of interest, as other posts seem to assume from the literal sample you gave. I base this on the comment at the end of your post:

    "I would then choose ENSG00000123643 to store in another hash with 213119_at, e.g. 213119_at{ENSG00000123643}.

    Storing it in a hash makes me think that you're wanting to do this for all other similar values as well.

    Original:

    While, I'm sure this could be golfed, my thought is just to build it up in the stages you describe. (Note, I changed some numbers to get two answers in the final hash.)

    use strict; use warnings; use Data::Dump::Streamer; my %original = split " ", do { local $/; <DATA> }; my %count; while ( my( $k, $v ) = each %original ) { $k =~ s/\A[^:]+:[^:]+:([^:]+).*/$1/; $count{$k}{$v}++; } my %summary; while ( my ($k, $v ) = each %count ) { my $max = [ sort { $v->{$b} <=> $v->{$a} } keys %$v]->[0]; $summary{$k} = $max; } Dump \%summary; __DATA__ Affy:HG_U133A:213119_at:74:303; ENSG00000123643 Affy:HG_U133A:213119_at:542:439; ENSG00000123643 Affy:HG_U133A:213119_at:658:369; ENSG00000123643 Affy:HG_U133A:213119_at:199:255; ENSG00000123643 Affy:HG_U133A:213119_at:436:453; ENSG00000123643 Affy:HG_U133A:213119_at:324:381; ENSG00000458158 Affy:HG_U133A:213118_at:584:557; ENSG00000123623 Affy:HG_U133A:213118_at:234:507; ENSG00000123623 Affy:HG_U133A:213118_at:482:429; ENSG00000123623 Affy:HG_U133A:213118_at:608:451; ENSG00000458158 Affy:HG_U133A:213118_at:356:297; ENSG00000123623

    Prints

    $HASH1 = { "213118_at" => 'ENSG00000123623', "213119_at" => 'ENSG00000123643' };

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      If i undertand your update correctly.....
      Im actually looking at several of these keys: 213118_at, 456218_at, 279548_at. Each of these will be in the format:
      Affy:HG_U133A:ID_HERE:74:303;
      With the obvious replacement of the middle id (key) value. The data in this format will again "probably" have a similar situation where multiple ids (values) as in my original example. Im after the id value that occurs most frequently, e.g ENSG00000123623. This has to be stored with the original key so I then know which key corresponds to the most abundant value. Im not at all interested in keeping the less frequent values, so these can be just dropped.
      A bit of a mouthful I know, but its the way my head works.

      MonkPaul.

        That's what I thought. My answer picks out the ID as being between the second and third colons.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Counting hash values
by jwkrahn (Abbot) on May 26, 2006 at 11:48 UTC
    This should give you a count of the values you want:
    my %id_count; $id_count{ $hash{ $_ } }++ for grep /:213119_at:/, keys %hash;