in reply to Statistics via hash- NCBI BLAST Tab Delimited file

You've made a good start certainly. Here are a few pointers to consider:

1. Your input file "contig1 ac346... etc" wasn't in a code block, so the tab characters don't render properly ...

2. I'm fairly sure others would agree that declaring a scalar with the same name as a hash (or an array) is ambiguous and asking for trouble. In your particular case, Perl does seem to do what you intended, however, later in the future you may extend this script and all of a sudden it breaks.
my $med = median(@{$organisms{$organism}}); #bad form, masks %med + declared in outer scope. Use a different name for this variable, eg. + $med_calculated. $med{$organism} = $med;
(Also, same goes for your $number assignment section)

3. When submitting scripts onto PM, use the <DATA> handle for your source data - it makes things a little more portable, and easier for others to reproduce what your seeing, and then offer assistance or a solution.

4. I'm not sure why you're calculating the median of $organisms{$organism} before its been populated with anything... maybe you need another loop after you've finished reading in your source data...?

5. Your headline appears to have 5 columns, but your data printing loop:-
foreach $organism (sort {$number{$a} <=> $number{$b}} keys %organisms) +{ print "$organism:\t$number{$organism}\t$organisms{$organism}\t$med +{$organism}\n" ; }
only has 4 values. something quite innocent like this is often an "oh yeh, i know thats wrong, but I'll fix it later". Make things easier for yourself. The k.i.s.s. acronym is a very wise one...

6. As you've discovered, the median function takes a list as it's input: print median(1, 100, 101);   # 100

since Perl's hashes and arrays can only ever hold scalars (or references), something isn't quite right there with the the median calculation... A good place to start would be to break down what you've got in 1 step, into 3 or 4... This also makes debugging easier as you can insert warn $varible_name statements to confirm they contain the values you are expecting them to.


use strict; use warnings; use Acme::Tools; my (%count, %organisms, %med, %number); my ($contig, $accession, $organism, $eval, $con_length, $map_length); #my $ref_filelist = $ARGV[0]; #open(FILELIST, $ref_filelist ) # or die "Could not open Reference filelist...($!)"; print "Organism\tFrequency\tMedian_Eval\tMedian_Contig_Length\tMedian_ +Mapped_Length\n"; #while (<FILELIST>){ while (<DATA>){ ( $contig, $accession, $organism, $eval, $con_length, $map_length +) = split ( '\t',); #my $median = $eval[($#eval / 2)]; my $med_calculated = median(@{$organisms{$organism}}); $med{$organism} = $med_calculated; #$organisms{$organism} = $eval; my $number_calculated = ++$count{$organism}; $number{$organism} = $number_calculated; } foreach $organism (sort {$number{$a} <=> $number{$b}} keys %organisms) +{ print "$organism:\t$number{$organism}\t$organisms{$organism}\t$med +{$organism}\n" ; } __DATA__ contig1 AC344 organism1 1e-1 122 45 contig1 AC344 organism1 1e-2 122 45 contig1 AC346 organism2 1e-102 122 46 contig1 Ac346 organism2 1e-100 122 46 contig1 Ac346 organism2 1e-114 122 46 contig1 Ac346 organism2 1e-111 122 46 contig2 NC333 organism3 1e-2 155 90 contig3 NC444 organism4 1 188 50 contig3 NC444 organism4 12 188 50