in reply to Re: Statistics on Tab Delimited File
in thread Statistics on Tab Delimited File

Thank you so much! I have been working on a solution all week. That is near what I was looking for but I did not explain it well. That is my fault.

__DATA__
Contig Organism Eval Length MappedLength
contig1 test1 1e-28 28 55
contig1 test2 1e-10 22 54
contig2 test1 1e-10 24 78
contig3 test2 10 78 57
contig4 test3 1e-5 200 55
contig4 test2 10 100 43
I am trying for this output (math may not be correct for median but frequency is correct):
Organism Frequency EvalMedian LengthMedian MappedMedian
test2 3 5 38 47
test1 2 1e-10 24 54
test3 1 1e-5 200 55

The "Frequency being how many time I see the organism in the file. I then take all of the values when I hit multiple times and find the median of all the values combined for that particular match (test1, test2 etc). If the "Organism" does not have a match the median values are the same as found.

I see that I did not get column one[0] in order but that does not matter for the final output. "test1" will actually be long scientific names.

Thanks