in reply to Re^2: Data range detection?
in thread Data range detection?

BrowserUk:

For evenly distributed, I was meaning choosing the distribution that most evenly spreads out the points. An exponential distribution on a linear axis will bunch everything up to the left, for example. Doing all the work to find out what "evenly distributed" is would be a headache. I hacked something together this morning that worked to select between linear and logarithmic in the number series you provided. To figure out the most "evenly distributed" version, I simply counted the number of points to the left of the midpoint and compared that to the number of points provided, selecting the series where the difference was the smallest.

From memory, it went something like:

sub check_list { my $r = shift; my ($min, $max) = minmax(@$r); my $ctr_lin = ($min+$max)/2; my $ctr_log = (log($min)+log($max))/2; my ($cnt_lin, $cnt_log)=(0,0); for (@$r) { ++$cnt_lin if $_ < $ctr_lin; ++$cnt_log if $_ < $ctr_log; } my $error_lin = abs($ctr_lin - @$r/2); my $error_log = abs($ctr_log - @$r/2); return $error_lin < $error_log ? "linear" : "log"; }

Update: I mentioned treating the axes separately, because some people were mentioning curve fitting (IIRC) which implied (to me) using both axes at the same time.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^4: Data range detection?
by roboticus (Chancellor) on Apr 14, 2015 at 11:09 UTC

    Just for completeness, here's the one I coded up yesterday morning:

    $ cat choose_axes.pl #!/usr/bin/env perl use strict; use warnings; my @series = ( [qw( 5 5 34 44 114 169 177 184 270 339 361 364 442 511 530 554 555 587 709 709 735 778 791 859 871 899 903 926 933 952 )], [ 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095, 8191, 16383, 32767, 65535, 131071, 262143, 524287, 1048575, 2097151, 4194303, 8388607, 16777215, 33554431, 67108863, 134217727, 268435455, 536870911, 1073741823 ], [ 1.713125e-005, 1.748086e-006, 2.101463e-006, 1.977405e-006, 3.597675e-006, 3.725492e-006, 3.924736e-006, 2.902199e-006, 3.988645e-006, 8.210367e-006, 3.360837e-006, 5.202907e-006, 7.082570e-006, 8.778026e-006, 7.079562e-005, 9.100576e-005, 5.258545e-005, 9.292677e-005, 1.789815e-004, 2.113948e-003, 7.229146e-004, 1.428995e-003, 2.742045e-003, 5.552746e-003, 1.822390e-002, 2.220999e-002, 4.316067e-002, 8.876963e-002, 1.751072e-001, 3.494051e-001, 7.155960e-001, 1.347822e+000 ], ); for my $ar (@series) { my ($type, $min, $max) = choose_axis_params($ar); print "($min .. $max) $type\n"; } sub minmax { my $min = my $max = shift; while (@_) { my $t = shift; $min = $t<$min ? $t : $min; $max = $t>$max ? $t : $max; } return $min, $max; } sub check_axis { my $name = shift; my @points = @_; my ($min, $max) = minmax(@points); my $midpoint = ($min+$max)/2; my $cnt = 0; for my $t (@points) { ++$cnt if $t > $midpoint; } my $err = abs(@points/2 - $cnt); return $name, $min, $max, $midpoint, $cnt, $err; } sub choose_axis_params { my $r = shift; my ($min,$max) = minmax(@$r); $r = [ sort @$r ]; my @axes; push @axes, [ check_axis('linear',@$r) ]; push @axes, [ check_axis('log', map { log($_) } @$r) ]; @axes = sort { $a->[-1] <=> $b->[-1] } @axes; #for my $r (@axes) { # printf "%-8.8s (%s .. %s) %s, %s, %s\n", @$r; #} return @{$axes[0]}; } $ perl choose_axes.pl (5 .. 952) linear (0 .. 20.794415415867) log (-13.2569890828565 .. 0.298489956293335) log

    There's nothing special about it, as it chooses the distribution that more evenly splits the points over both halves of the interval. So it'll probably choose poorly on the vertical axis of a half-wave rectified sine wave or similar. (I'm guessing that it would choose a log axis instead of linear in that case...)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.