in reply to Seeking abnormalities in data sets.
First, in order to properly analyze your data, you must know within an acceptable level of confidence, that the model you are using is the appropriate model, be it linear, exponential, or other. For example, you can use the correlation coefficient for e.g. the linear model to determine if enough of the error can be explained by that model to provide you with enough confidence that the correct model is being used (see a statistics book that contains linear and non-linear regression techniques).
Without getting into too much detail, as a crude method, say for instance, if you don't have the background to analyze the data to the necessary degree, if you have a 'target' value for each point (e.g. in time, or other), and say, you don't want to accept data more than say, +/- 3%, you can calculate a 'band' around that 'target' data. Then you can plot your actual data along with these bands (you will have 3 curves using point-to-point vs. fitting a regression, especially if you don't have either the tools or background necessary to determine the actual regression model each time you collect the 40,000 data points) and visually look at the data. If the actual data falls outside of this band, then you may want to look at that particular data point a little closer. That does not mean automatically exclude it, unless you have enough info to support excluding it. This method is again, considered 'crude'.
You can use e.g. Microsoft Excel to import your data into (e.g. using a comma-delimited format for the data, which you can get your Perl program can create for you; you can calculate your bands either within Excel very easily (preferred to keep imported filesize to a minimum) for plotting and/or analysis. This software has statistical routines built-in. Plus, there is a book called, "Microsoft Excel 2000 Formulas" by John Walkenbach (ISBN 0-7645-4609-0) that may provide you with more info for that software. Of course, there are other stats books you can use with this software.
Be cautious in using crude methods -- what I mean is, don't try to read too much into the results. These types of methods are many times used to provide you with a 'direction', not conclusions.
Hope some of this helps.
Regards. --newbie00
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Seeking abnormalities in data sets.
by tmiklas (Hermit) on Dec 27, 2001 at 06:42 UTC | |
|
Re: Re: Seeking abnormalities in data sets.
by scain (Curate) on Dec 27, 2001 at 20:36 UTC |