comment on

Dear Monks,

I am having a mysterious problem that I can't figure out. I am using Statistics::TTest to calculate p-values for several thousand pairs of distributions of numbers. I'm using these p-values to create a volcano plot, and when I plot the p-values I observe a strange artifact where many of the points are getting the same p-value. After some sleuthing, I can re-create the phenomenon with the four pairs of numbers in the code below. When I calculate p-values for these pairs in Excel, the values are all quite distinct (different by several orders of magnitude) but using Statistics::TTest I'm getting EXACTLY the same value for each pair.

The p-values are very small (around 1.6e-12) so I wonder if this ins't some sort of precision issue... but I can't figure it out. If you run the code below, it will display four identical p-values (T-test probabilities, "t_prob") despite the fact that the true p-values range from 1.7e-19 to 2.8e-29.

I have tried doing something similar using Statistics::Distributions but I had the same problem, however I think Statistics::TTest relies on Statistics::Distributions for these calculations. I was not able to find any other modules that perform this calculation.

I should note that the vast majority (99%) of pairs of distributions are getting correct p-values, it's just a handful that produce this strange collision on a wrong value.

Does anyone out there have any insight into what's going wrong? Help is very much appreciated, thanks!


#!/usr/bin/perl -w

use strict;
use Statistics::TTest;

my %datasets = ();


@{$datasets{a}[0]} = (0.722466024,0.925999419,1,1.049630768,1.05658352
+8,1.10433666,1.13093087,1.150559677,1.220329955,1.316145742,1.3334237
+34,1.63691458,1.691534165,0.713695815,0.815575429,0.918386234,0.92599
+9419,0.941106311,0.948600847,0.970853654,0.98550043,0.98550043,1,1.02
+8569152,1.042644337,1.10433666,1.117695043,1.269033146,1.286881148,1.
+298658316,1.575312331);

@{$datasets{a}[1]} = (-0.49410907,-0.358453971,-0.321928095,-0.2863041
+85,-0.200912694,-0.200912694,-0.168122759,-0.120294234,-0.120294234,-
+0.104697379,-0.104697379,-0.074000581,-0.058893689,-0.577766999,-0.51
+4573173,-0.514573173,-0.49410907,-0.358453971,-0.358453971,-0.3040061
+87,-0.251538767,-0.184424571);

@{$datasets{b}[0]} = (-0.434402824,-0.286304185,-0.251538767,-0.058893
+689,-0.043943348,-0.043943348,0.084064265,0.163498732,0.23878686,0.23
+878686,0.310340121,0.839959587,0.879705766,-0.556393349,-0.268816758,
+-0.251538767,-0.152003093,-0.104697379,-0.089267338,-0.029146346,-0.0
+29146346,0,0.070389328,0.084064265,0.097610797,0.124328135,0.13750352
+4,0.189033824,0.189033824,0.214124805,0.214124805,0.214124805,0.32192
+8095,0.333423734,0.367371066,0.40053793,0.411426246,0.443606651,0.516
+015147,0.669026766,0.713695815);

@{$datasets{b}[1]} = (0.782408565,0.799087306,0.82374936,0.887525271,0
+.925999419,0.933572638,0.956056652,0.97819563,0.98550043,1.021479727,
+1.084064265,1.097610797,1.13093087,1.150559677,1.15704371,1.176322773
+,1.182692298,1.22650853,1.286881148,1.292781749,1.310340121,1.4594316
+19,1.485426827,1.521050737,1.59454855,1.695993813,1.713695815,1.72683
+1217,0.40053793,0.411426246,0.59454855,0.925999419,0.941106311,0.9486
+00847,0.98550043,1.028569152,1.070389328,1.117695043,1.124328135,1.22
+0329955,1.316145742,1.744161096);

@{$datasets{c}[0]} = (-0.043943348,-0.029146346,-0.01449957,0.02856915
+2,0.097610797,0.124328135,0.176322773,0.201633861,0.263034406,-0.8624
+96476,-0.104697379,0.084064265,0.084064265,0.084064265,0.124328135,0.
+124328135,0.163498732,0.263034406,0.275007047,0.286881148,0.321928095
+,0.333423734);

@{$datasets{c}[1]} = (-2.64385619,-2.556393349,-2.473931188,-2.3959286
+76,-2.395928676,-2.395928676,-2.321928095,-2.321928095,-2.321928095,-
+2.251538767,-2.251538767,-2.184424571,-2.120294234,-2,-0.535331733,-1
+.64385619,-1.556393349,-1.514573173,-1.514573173,-1.473931188,-1.4344
+02824,-1.434402824,-1.395928676,-1.395928676,-1.395928676,-1.39592867
+6,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1.358453971,-1
+.321928095,-1.286304185,-1.286304185,-1.286304185,-1.251538767,-1.217
+591435,-1.120294234,-1);


@{$datasets{d}[0]} = (0.933572638261024,0.948600847493356,0.9486008474
+93356,0.970853654340483,0.978195629681652,1.111031312388740,1.1505596
+76575380,1.416839741912830,0.731183241572200,0.790772037862000,0.8155
+75428862573,0.855989697308481,0.871843648509318,0.895302621333307,0.9
+33572638261024,0.941106310946431,0.948600847493356,0.956056652412403,
+0.970853654340483,0.992768430768924,1.000000000000000,1.0635029423061
+60,1.226508529808680,1.269033146455240,1.298658315564520,1.7048719644
+56350);

@{$datasets{d}[1]} = (-0.473931188332412,0.028569152196771,0.042644337
+408494,0.056583528366368,0.070389327891398,0.084064264788475,0.097610
+796626422,0.111031312388744,0.454175893185802,0.454175893185802,-0.51
+4573172829758,-0.268816758427800,-0.168122758808327,-0.13606154957602
+8,-0.043943347587597,0.014355292977070,0.111031312388744,0.1243281350
+02202,0.137503523749935,0.176322772640463,0.238786859587116,0.2509615
+73533219,0.344828496997441);


foreach my $dataset (sort keys %datasets) {

    my $ttest = new Statistics::TTest;
    $ttest->load_data(\@{$datasets{$dataset}[0]},\@{$datasets{$dataset
+}[1]});
    print "$dataset - t_prob:\t$ttest->{t_prob}\n\n";

}
[download]

In reply to strange collision when calculating p-values using Statistics::TTest by roadnottaken

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.