in reply to Counting hash values

Update: Just to clarify, the answer I give below assumes that you're interested in a general solution, not just the case where "213119_at" is the only value of interest, as other posts seem to assume from the literal sample you gave. I base this on the comment at the end of your post:

"I would then choose ENSG00000123643 to store in another hash with 213119_at, e.g. 213119_at{ENSG00000123643}.

Storing it in a hash makes me think that you're wanting to do this for all other similar values as well.

Original:

While, I'm sure this could be golfed, my thought is just to build it up in the stages you describe. (Note, I changed some numbers to get two answers in the final hash.)

use strict; use warnings; use Data::Dump::Streamer; my %original = split " ", do { local $/; <DATA> }; my %count; while ( my( $k, $v ) = each %original ) { $k =~ s/\A[^:]+:[^:]+:([^:]+).*/$1/; $count{$k}{$v}++; } my %summary; while ( my ($k, $v ) = each %count ) { my $max = [ sort { $v->{$b} <=> $v->{$a} } keys %$v]->[0]; $summary{$k} = $max; } Dump \%summary; __DATA__ Affy:HG_U133A:213119_at:74:303; ENSG00000123643 Affy:HG_U133A:213119_at:542:439; ENSG00000123643 Affy:HG_U133A:213119_at:658:369; ENSG00000123643 Affy:HG_U133A:213119_at:199:255; ENSG00000123643 Affy:HG_U133A:213119_at:436:453; ENSG00000123643 Affy:HG_U133A:213119_at:324:381; ENSG00000458158 Affy:HG_U133A:213118_at:584:557; ENSG00000123623 Affy:HG_U133A:213118_at:234:507; ENSG00000123623 Affy:HG_U133A:213118_at:482:429; ENSG00000123623 Affy:HG_U133A:213118_at:608:451; ENSG00000458158 Affy:HG_U133A:213118_at:356:297; ENSG00000123623

Prints

$HASH1 = { "213118_at" => 'ENSG00000123623', "213119_at" => 'ENSG00000123643' };

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Replies are listed 'Best First'.
Re^2: Counting hash values
by MonkPaul (Friar) on May 26, 2006 at 14:17 UTC
    If i undertand your update correctly.....
    Im actually looking at several of these keys: 213118_at, 456218_at, 279548_at. Each of these will be in the format:
    Affy:HG_U133A:ID_HERE:74:303;
    With the obvious replacement of the middle id (key) value. The data in this format will again "probably" have a similar situation where multiple ids (values) as in my original example. Im after the id value that occurs most frequently, e.g ENSG00000123623. This has to be stored with the original key so I then know which key corresponds to the most abundant value. Im not at all interested in keeping the less frequent values, so these can be just dropped.
    A bit of a mouthful I know, but its the way my head works.

    MonkPaul.

      That's what I thought. My answer picks out the ID as being between the second and third colons.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.