Dear Monks,
I am having a mysterious problem that I can't figure out. I am using Statistics::TTest to calculate p-values for several thousand pairs of distributions of numbers. I'm using these p-values to create a volcano plot, and when I plot the p-values I observe a strange artifact where many of the points are getting the same p-value. After some sleuthing, I can re-create the phenomenon with the four pairs of numbers in the code below. When I calculate p-values for these pairs in Excel, the values are all quite distinct (different by several orders of magnitude) but using Statistics::TTest I'm getting EXACTLY the same value for each pair.
The p-values are very small (around 1.6e-12) so I wonder if this ins't some sort of precision issue... but I can't figure it out. If you run the code below, it will display four identical p-values (T-test probabilities, "t_prob") despite the fact that the true p-values range from 1.7e-19 to 2.8e-29.
I have tried doing something similar using Statistics::Distributions but I had the same problem, however I think Statistics::TTest relies on Statistics::Distributions for these calculations. I was not able to find any other modules that perform this calculation.
I should note that the vast majority (99%) of pairs of distributions are getting correct p-values, it's just a handful that produce this strange collision on a wrong value.
Does anyone out there have any insight into what's going wrong? Help is very much appreciated, thanks!
#!/usr/bin/perl -w use strict; use Statistics::TTest; my %datasets = (); @{$datasets{a}[0]} = (0.722466024,0.925999419,1,1.049630768,1.05658352 +8,1.10433666,1.13093087,1.150559677,1.220329955,1.316145742,1.3334237 +34,1.63691458,1.691534165,0.713695815,0.815575429,0.918386234,0.92599 +9419,0.941106311,0.948600847,0.970853654,0.98550043,0.98550043,1,1.02 +8569152,1.042644337,1.10433666,1.117695043,1.269033146,1.286881148,1. +298658316,1.575312331); @{$datasets{a}[1]} = (-0.49410907,-0.358453971,-0.321928095,-0.2863041 +85,-0.200912694,-0.200912694,-0.168122759,-0.120294234,-0.120294234,- +0.104697379,-0.104697379,-0.074000581,-0.058893689,-0.577766999,-0.51 +4573173,-0.514573173,-0.49410907,-0.358453971,-0.358453971,-0.3040061 +87,-0.251538767,-0.184424571); @{$datasets{b}[0]} = (-0.434402824,-0.286304185,-0.251538767,-0.058893 +689,-0.043943348,-0.043943348,0.084064265,0.163498732,0.23878686,0.23 +878686,0.310340121,0.839959587,0.879705766,-0.556393349,-0.268816758, +-0.251538767,-0.152003093,-0.104697379,-0.089267338,-0.029146346,-0.0 +29146346,0,0.070389328,0.084064265,0.097610797,0.124328135,0.13750352 +4,0.189033824,0.189033824,0.214124805,0.214124805,0.214124805,0.32192 +8095,0.333423734,0.367371066,0.40053793,0.411426246,0.443606651,0.516 +015147,0.669026766,0.713695815); @{$datasets{b}[1]} = (0.782408565,0.799087306,0.82374936,0.887525271,0 +.925999419,0.933572638,0.956056652,0.97819563,0.98550043,1.021479727, +1.084064265,1.097610797,1.13093087,1.150559677,1.15704371,1.176322773 +,1.182692298,1.22650853,1.286881148,1.292781749,1.310340121,1.4594316 +19,1.485426827,1.521050737,1.59454855,1.695993813,1.713695815,1.72683 +1217,0.40053793,0.411426246,0.59454855,0.925999419,0.941106311,0.9486 +00847,0.98550043,1.028569152,1.070389328,1.117695043,1.124328135,1.22 +0329955,1.316145742,1.744161096); @{$datasets{c}[0]} = (-0.043943348,-0.029146346,-0.01449957,0.02856915 +2,0.097610797,0.124328135,0.176322773,0.201633861,0.263034406,-0.8624 +96476,-0.104697379,0.084064265,0.084064265,0.084064265,0.124328135,0. +124328135,0.163498732,0.263034406,0.275007047,0.286881148,0.321928095 +,0.333423734); @{$datasets{c}[1]} = (-2.64385619,-2.556393349,-2.473931188,-2.3959286 +76,-2.395928676,-2.395928676,-2.321928095,-2.321928095,-2.321928095,- +2.251538767,-2.251538767,-2.184424571,-2.120294234,-2,-0.535331733,-1 +.64385619,-1.556393349,-1.514573173,-1.514573173,-1.473931188,-1.4344 +02824,-1.434402824,-1.395928676,-1.395928676,-1.395928676,-1.39592867 +6,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1 +.321928095,-1.286304185,-1.286304185,-1.286304185,-1.251538767,-1.217 +591435,-1.120294234,-1); @{$datasets{d}[0]} = (0.933572638261024,0.948600847493356,0.9486008474 +93356,0.970853654340483,0.978195629681652,1.111031312388740,1.1505596 +76575380,1.416839741912830,0.731183241572200,0.790772037862000,0.8155 +75428862573,0.855989697308481,0.871843648509318,0.895302621333307,0.9 +33572638261024,0.941106310946431,0.948600847493356,0.956056652412403, +0.970853654340483,0.992768430768924,1.000000000000000,1.0635029423061 +60,1.226508529808680,1.269033146455240,1.298658315564520,1.7048719644 +56350); @{$datasets{d}[1]} = (-0.473931188332412,0.028569152196771,0.042644337 +408494,0.056583528366368,0.070389327891398,0.084064264788475,0.097610 +796626422,0.111031312388744,0.454175893185802,0.454175893185802,-0.51 +4573172829758,-0.268816758427800,-0.168122758808327,-0.13606154957602 +8,-0.043943347587597,0.014355292977070,0.111031312388744,0.1243281350 +02202,0.137503523749935,0.176322772640463,0.238786859587116,0.2509615 +73533219,0.344828496997441); foreach my $dataset (sort keys %datasets) { my $ttest = new Statistics::TTest; $ttest->load_data(\@{$datasets{$dataset}[0]},\@{$datasets{$dataset +}[1]}); print "$dataset - t_prob:\t$ttest->{t_prob}\n\n"; }
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |