I am attempting to parse a text delimited file that contains sequence data and find the median on each column per organism. I posted a few days ago but I have still been working on an appropriate solution. Toolic helped me last time and I appreciate it but I still seem to be lost. Toolic did create a working solution but I still needed frequency and the proper format via organism. Thanks again for any help. http://www.perlmonks.org/?node_id=812285
Here is an example of my input data:
contig1 AC344 organism1 1e-1 122 45The first two columns are correct but it breaks apart when I try to get the medians. Here is an example of my output so far:
Organism Frequency Median_Eval Median_Contig_Length Median_Mapped_LengthHere is my code so far:
use strict; use warnings; use Acme::Tools; my (%count, %organisms, %med, %number); my $ref_filelist = $ARGV[0]; my ($contig, $accession, $organism, $eval, $con_length, $map_length); open(FILELIST, $ref_filelist ) or die "Could not open Reference filelist...($!)"; print "Organism\tFrequency\tMedian_Eval\tMedian_Contig_Length\tMedian +_Mapped_Length\n"; while (<FILELIST>){ ( $contig, $accession, $organism, $eval, $con_length, $map_length ) = +split ( '\t',); #my $median = $eval[($#eval / 2)]; my $med = median(@{$organisms{$organism}}); $med{$organism} = $med; #$organisms{$organism} = $eval; my $number = ++$count{$organism}; $number{$organism} = $number; } foreach $organism (sort {$number{$a} <=> $number{$b}} keys %organisms) +{ print "$organism:\t$number{$organism}\t$organisms{$organism}\t$med +{$organism}\n" ; }
In reply to Statistics via hash- NCBI BLAST Tab Delimited file by Paragod28
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |