Making some progress on the module. So here's some sample data for a column with the raw count and cardinality value for each unique value in the column:
$VAR1 = { 'ACTIVE' => { 'count' => 1941, 'value_card' => '0.631630328669053' }, 'INACTIVE' => { 'value_card' => '0.233322486169867', 'count' => 717 }, 'RETIRED' => { 'count' => 414, 'value_card' => '0.134721770257078' }, 'STATUS' => { 'count' => 1, 'value_card' => '0.000325414904002603' } };
So in this simple case, the 'STATUS' value is unique to this column and is clearly an outlier from the other three possible values. But in fuzzier situations, how would I determine whether 'STATUS' is "1 standard deviation" away from the other value cardinality values?
$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks
In reply to Re^2: Useful heuristics for analyzing arrays of data to determine column header
by nysus
in thread Useful heuristics for analyzing arrays of data to determine column header
by nysus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |