Number of values for each key in hash

Sofie has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Number of values for each key in hash by haj (Vicar) on Feb 29, 2020 at 12:26 UTC
There are a few gotchas in your code... let me modify it like this: `use strict; use warnings; my %GeneCount = (); #open the textfile GeneType.txt open (GENETYPE, "GeneType.txt") or die "Could not open file: '$!'"; my $header = <GENETYPE>; # read the header before entering the loop while (<GENETYPE>) { chomp; my ($GeneName, $GeneType)= split (/\t/, $_); $GeneCount{$GeneType}++; } for my $type (sort keys %GeneCount) { print "$type: $GeneCount{$type}\n"; }` [download] So what did I change? I started the program with `use strict;` and `use warnings;` which is a good habit and will save a lot of time in the long run. The only downside is that I now have to declare `my %GeneCount = ()` before using it. In the `open` statement I included the reason why it failed into the error message. There's also the opportunity to use the three-parameter form of open and a lexical file handle, which I let pass, because your code is correct (but slightly out of fashion). Instead of removing the header in every line of the loop, I just read the header before even entering the loop. I added `chomp` which kills the newline which will otherwise be at the end of every gene type you read. Most important for your logic: I changed the hash so that the types are the keys, and the count are the values. I seem to recall that older versions of Perl (I'm using 5.28) issued some warnings about `uninitialized $GeneCount{pseudogene}`. To get rid of these you can add the line `no warnings "uninitialized"` before entering the loop. And that's it. The rest is just typing out the collected values. If you are a beginner in Perl, you might also checkout https://learn.perl.org/books/: They are fun to read.	[reply] [d/l]
Re^2: Number of values for each key in hash by Anonymous Monk on Feb 29, 2020 at 14:21 UTC
Possible additional tweaks: There is no need to initialize `%GeneCount` to `()`. That is the value it takes on anyway when declared. You may want to cultivate the habit of using lexical variables as file handles (i.e. `open my $genetype, ... or die ...`). Bareword file handles are global. You may want to cultivate the habit of using three-argument opens (i.e. `open my $genetype, '<', 'GeneType.txt' or die ...`. This is the only way you can specify things like file encoding. Purely as a style thing, built-ins like `open()` and `split()` do not need parentheses, except for precedence. Whoever wrote perlopentut uses parentheses because that author also chose to use the tightly-binding `'\|\|'` operator rather than the loosely-binding `or` operator for error checking. None of these are required to make the presented script work.	[reply] [d/l] [select]
Re^2: Number of values for each key in hash by Sofie (Acolyte) on Feb 29, 2020 at 12:43 UTC
That works perfectly, thanks!	[reply]
Re: Number of values for each key in hash by hippo (Archbishop) on Feb 29, 2020 at 11:51 UTC
Your question sounds very much like How do I extract the number of times a value appears in a hash?. Does one of the answers there solve it for you? The one contributed by BlaisePascal seems like a good place to start.	[reply]
Re: Number of values for each key in hash by bliako (Abbot) on Feb 29, 2020 at 12:55 UTC
The link provided by hippo seems a good place to start and the correspondence to your case can be deduced by: `($GeneName, $GeneType)= split (/\t/, $_); # your program my ($ip, $size) = split /:/; # the other program` [download] Once you practice building and searching the hash, consider this: in a hash the most efficient search is by its keys. If you need to check by value then consider re-designing your hash and use the values for keys (of course this is not always possible because the keys of a hash must be unique). In a situation where a key can be associated with multiple values (and must absolutely used as a key, i.e. can't be redesigned), we can use an array to hold all the values. Like: `$hash{akey} = ['v1', 'v2', 'v3'];` or even another hash like `$hash{akey} = {'k1'=>['v1k1','v2k1'], 'k2' => ['v1k2','v2k2']};`. There is also the possibility of arrays-of-hashes, arrays-of-arrays etc. etc. etc. With nesting data structures the possibilities are endless. I mentioned this because in your case I think the key should be the genotype (and not the genename) and the value should be an array of gene names. Also, `my $scalar = delete $GeneHash{GeneName};` will remove the genename you just added to your hash! You will end up with nothing. You probably wanted to skip the first line of the file. And do that just once, i.e. before the loop, like `open (GENETYPE, "GeneType.txt") or die "Could not open file"; my $header = <GENETYPE>;` which skips the first line of the file and saved it in that variable. Finally, shouldn't `print (each %GeneHash);` be outside the file-reading-hash-generation loop? And this would do just fine: `while( my ($k,$v) = each %GeneHash ){ print "$k=>$v\n"; }` Having said all these, and you going through them in order to get some experience, I want to mention the existence of BioPerl which is especially designed for bio-informatics and does tasks like yours pretty well, here is something relevant: https://bioperl.org/howtos/Beginners_HOWTO.html#item19 . But even with BioPerl you will need to know your hashes. bw, bliako	[reply] [d/l] [select]
Re: Number of values for each key in hash by BillKSmith (Monsignor) on Feb 29, 2020 at 21:20 UTC
use grep; `my $count = grep /$specific/ values(%GeneHash);` [download] Bill	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.