Re^3: Data range detection?

Replies are listed 'Best First'.
Re^4: Data range detection? by BrowserUk (Patriarch) on Apr 13, 2015 at 17:35 UTC
There are various options to measure the "goodness of fit" The problem is that to do a goodness of fit calculation, you need two sets of data: the actual & expected. The only two sets that make any sense (to me at least) are the pre-scaled and post-scaled sets; but the correlation between those will (should) be perfect whichever scaling method is used, since the latter is derived mathematically from the former. I can't see what other 'expected' values you could use? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^5: Data range detection? by hdb (Monsignor) on Apr 13, 2015 at 18:45 UTC
The "expected" data set is the linear fit. The following script uses linear regression and the R^2 metrics to calculate a measure of fit for your three datasets: use strict; use warnings; use Statistics::LineFit; sub fit { my $fit = Statistics::LineFit->new(); $fit->setData( @_ ); return $fit->rSquared(); } my @data = ( [ qw( 5 5 34 44 114 169 177 184 270 339 361 364 442 511 5 +30 554 555 587 709 709 735 778 791 859 871 899 903 926 933 952 ) ], [ 0.5, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095, +8191, 16383, 32767, 65535, 131071, 262143, 524287, 1048575, 2097151, 4194303, 8388607, 16777215, 33554431, 671 +08863, 134217727, 268435455, 536870911, 1073741823 ], [ 1.713125e-005, 1.748086e-006, 2.101463e-006, 1.977405e-006, + 3.597675e-006, 3.725492e-006, 3.924736e-006, 2.902199e-006, 3.988645e-006, 8.210367e-006, 3.360837e-006, 5.202907e-006, + 7.082570e-006, 8.778026e-006, 7.079562e-005, 9.100576e-005, 5.258545e-005, 9.292677e-005, 1.789815e-004, 2.113948e-003, + 7.229146e-004, 1.428995e-003, 2.742045e-003, 5.552746e-003, 1.822390e-002, 2.220999e-002, 4.316067e-002, 8.876963e-002, + 1.751072e-001, 3.494051e-001, 7.155960e-001, 1.347822e+000 ] ); print " linear loglinear loglog\n"; for my $d (@data) { my @x = 1..@$d; my @logx = map log, @x; my @logd = map log, @$d; printf "%10.2f %10.2f %10.2f\n", fit( \@x, $d), fit( \@x, \@logd), f +it( \@logx, \@logd ); } [download] The result is `linear loglinear loglog 0.99 0.69 0.95 0.26 1.00 0.86 0.26 0.90 0.58` [download] which shows that the first data set describes a linear relationship while the others are more of log type (the largest R^2 wins). If you have a stats package at hand (or even Excel only) you can do the same thing and visualize the results.	[reply] [d/l] [select]
Re^6: Data range detection? by BrowserUk (Patriarch) on Apr 13, 2015 at 19:12 UTC
Sorry, but unless my eye's are deceiving me (quite possible), but you don't appear to be fitting the data at all: `21 my @x = 1..@$d; ### Takes the values 1..3 +0, 1..31, and 1..32 22 my @logx = map log, @x; ### is the logs of those +sequential ranges 23 my @logd = map log, @$d; ### the loglogs of those +sequential ranges. 24 printf "%10.2f %10.2f %10.2f\n", fit( \@x, $d), fit( \@x, \@logd) +, fit( \@logx, \@logd );` [download] The actual data is never passed to the fit sub? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l]
Re^7: Data range detection? by hdb (Monsignor) on Apr 13, 2015 at 19:22 UTC
Re^8: Data range detection? by BrowserUk (Patriarch) on Apr 13, 2015 at 19:39 UTC
Some notes below your chosen depth have not been shown here
Re^6: Data range detection? by BrowserUk (Patriarch) on Apr 14, 2015 at 03:59 UTC
Sorry hdb, it seems it was more than just my eye's giving me trouble last night. And given it was you, I should have known better. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^7: Data range detection? by hdb (Monsignor) on Apr 14, 2015 at 06:22 UTC