Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello -- I am working on some coding for a bioinformatics-type project. (Though the perl code shouldn't terribly complicated.) I have a (large) tab delimited file that I would like to read line by line. And for each line, I need to update information in a hash. Column 19 is the name of a gene. The rest of the columns describe a single variant within that gene. Genes can have multiple variants (thus, many lines of the file will be referring to the same gene and have the same information in column 19). I would like to read each line of the file, store the gene name (column 19) as the key in a hash, and then add 1 to the value. The end result should be a hash with ~20,000 keys, with corresponding values representing the number of variants (or numbers of lines in the file describing that particular gene). I am not interested in the type of variant, just the number. Here is what I have so far. The file with genes and variant information is to be entered on the command line.
#!/usr/bin/perl use strict; use warnings; my $filename = $ARGV{0}; open( my $fh => $filename) || die "Cannot open $filename: $!"; my %gene_count; while(my $line = <$fh>) { my @row = split("\t",$line); $gene_count{ $row[18] } = ++; } close($fh);
Here is what I am struggling with: 1) Can I write over the value of a key? ++ doesn't seem to be appropriate. 2) How do I start the value at 0 or 1? Perhaps this is not a simple enough problem for a single loop. 3) Will I be encountering issues once I get to the second time a gene is listed? Any help/direction/advice is much appreciated! Cheers, A
|
|---|