how can I print my hash values once?

lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I have created a hash after parsing a file in order to count the occurrence of values.
I am able to calculate and print the counts for the values, but it goes to output multiple times. How can I print duplicates once?

use strict;
#open file
open(my $in,"/Users/mydir/Desktop/CCDS.current.txt") or die " Can't op
+en file: $!";
#open out file
open(OUT, ">/Users/mydir/Desktop/genesperchrcnt.txt");
# initialize the hash
my %geneids=();
#open the file and push the info from the designated columns into it
# remove header
my $firstline = <$in>;
chomp $firstline;
while(<$in>){
    chomp; # remove the newline character
    my @fields = split (/\t/);
    #extract the columns that we are interested in.
    # Populate the key value pairs of the hash with $gene and $id
    $geneids{$fields[2]} = $fields[0];
    # initialize an array to store hash values
    my @chr;
    push @chr, $fields[0];
    #count chromosome number which is the value in the hash
    $geneids{$fields[0]}++;
    next if $geneids{$fields[0]} > 1;
    foreach my $values (sort values %geneids) {
        print OUT "Chromosome $values has $geneids{$values} genes\n"; 
    }
}
close($in);
close(OUT);
=cut Output
Chromosome 1 has 1635 genes
Chromosome 1 has 1635 genes
Chromosome 1 has 1635 genes
Chromosome 1 has 1635 genes
Chromosome 3 has 778 genes
Chromosome 3 has 778 genes
Chromosome 3 has 778 genes
Chromosome 3 has 778 genes
Chromosome 4 has 518 genes
Chromosome 4 has 518 genes
Chromosome 4 has 518 genes
Chromosome 4 has 518 genes
[download]

I need each duplicate printed once. What's the best way to do this?
DeepSpace

Comment on how can I print my hash values once? Download Code

Replies are listed 'Best First'.
Re: how can I print my hash values once? by ikegami (Patriarch) on Mar 10, 2011 at 05:16 UTC
You're iterating over the values, then proceed to use it as the key. I suspect you want `foreach my $geneid ( sort { $geneids{$a} <=> $geneids{$b} } keys %geneids ) { print OUT "Chromosome $geneid has $geneids{$geneid} genes\n"; }` [download] Also switched from `cmp` to `<=>` so that 10 comes after 2. Oh! And you want to move the foreach outside of the while loop. Update: Added last paragraph.	[reply] [d/l] [select]
Re^2: how can I print my hash values once? by lomSpace (Scribe) on Mar 10, 2011 at 14:29 UTC
ikegami, Thanks that was simple enough. I failed to mention that there are counts for the x and y chr. example: #!/usr/bin/perl -w use strict; #open file open(my $in,"/Users/mgavibrathwaite/Desktop/CCDS.current.txt") or die +" Can't open file: $!"; #open out file open(OUT, ">/Users/mgavibrathwaite/Desktop/genesperchrcnt.txt"); # initialize the hash my %geneids=(); #open the file and push the info from the designated columns into it # remove header my $firstline = <$in>; chomp $firstline; while(<$in>){ chomp; # remove the newline character my @fields = split (/\t/); #extract the columns that we are interested in. # Populate the key value pairs of the hash with $gene and $id $geneids{$fields[2]} = $fields[0]; # initialize an array to store hash values my @chr; push @chr, $fields[0]; #count chromosome number which is the value in the hash $geneids{$fields[0]}++; next if $geneids{$fields[0]} > 1; } foreach my $geneid ( sort { $geneids{$a} <=> $geneids{$b} } keys %geneids ) { print OUT "Chromosome $geneid has $geneids{$geneid} genes\n"; } close($in); close(OUT); =cut Output Chromosome has X genes Chromosome KLHL13 has X genes Chromosome UTY has Y genes Chromosome SPIN2B has X genes Chromosome PIR has X genes Chromosome ADRBK2 has 22 genes Chromosome SLC2A11 has 22 genes Chromosome SELO has 22 genes Chromosome PIK3IP1 has 22 genes Chromosome 21 has 323 genes Chromosome 18 has 358 genes Chromosome 13 has 402 genes Chromosome 22 has 553 genes Chromosome 20 has 724 genes Chromosome 15 has 733 genes Chromosome 14 has 772 genes Chromosome 8 has 827 genes Chromosome 4 has 922 genes Chromosome 9 has 982 genes Chromosome 10 has 1007 genes Chromosome 16 has 1009 genes Chromosome X has 1045 genes Chromosome 5 has 1054 genes Chromosome 7 has 1137 genes Chromosome 12 has 1283 genes Chromosome 6 has 1298 genes Chromosome 3 has 1354 genes Chromosome 17 has 1412 genes Chromosome 11 has 1543 genes Chromosome 2 has 1624 genes Chromosome 19 has 1660 genes Chromosome 1 has 2611 genes [download] I am only interested in output that contains the "Chromosome "num/x/y" has "num" genes. How can I accomplish that? Thanks Ikegami! DeepSpace	[reply] [d/l]
Re^3: how can I print my hash values once? by ikegami (Patriarch) on Mar 10, 2011 at 19:55 UTC
Just a quick note, you seem to be placing some garbage in your hash: `Chromosome has X genes Chromosome KLHL13 has X genes Chromosome UTY has Y genes Chromosome SPIN2B has X genes Chromosome PIR has X genes` [download] The values in the hash should only be counts. Once you fix that bug, you can use `foreach my $geneid ( sort { $geneids{$a} <=> $geneids{$b} } keys %geneids ) { if ($geneid =~ /^(?:[0-9]+\|X\|Y)\z/) { print OUT "Chromosome $geneid has $geneids{$geneid} genes\n"; } }` [download]	[reply] [d/l] [select]
Re^4: how can I print my hash values once? by lomSpace (Scribe) on Mar 11, 2011 at 15:17 UTC
Re^5: how can I print my hash values once? by ikegami (Patriarch) on Mar 11, 2011 at 21:50 UTC
Re^3: how can I print my hash values once? by umasuresh (Hermit) on Mar 10, 2011 at 14:59 UTC
Untested: You could add something like `if ($geneid =~ /[0-9]+\|X\|Y/ && $geneids{$geneid} =~/[0-9]+/) { print "..."; }` [download]	[reply] [d/l]
Re: how can I print my hash values once? by roboticus (Chancellor) on Mar 10, 2011 at 13:52 UTC
IomSpace: If the lines are truly duplicates, and you don't care about the ordering, then I think I'd leverage the sort utility to remove duplicates and just open^[1] the file with: `open(my $in, '-\|', "sort -u /Users/mydir/Desktop/CCDS.current.txt") or + die " Can't open file: $!";` [download] Then your code can be considerably shorter/simpler. Note: [1] From my reading, I think this is how to use an input pipe with the three-argument form of open, but I've not used it before. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]