in reply to Re: RFC: Statistics::KernelEstimation - Kernel Density Estimates and Histograms
in thread RFC: Statistics::KernelEstimation - Kernel Density Estimates and Histograms

Many thanks for your comments. Let me address them:

User-defined Kernel: I debated that with myself. It would be really easy, since the choice of kernel function is implemented in terms of refs to functions, anyway.

However, the protocol that the user-supplied kernel function has to adhere to is a bit larger than one thinks (it's not just the interface, but it also has to be normalized, and the user has to supply its integral as well for use with the CDF, and possibly the 2nd derivative, for use with the bandwidth optimization). What is more, the choice of kernel function is not really that critical - all kernels give more or less the same results. And the two most useful and most popular ones are the Gaussian and the Epanechnikov kernel, which are included.

So, with those considerations, it seemed as if allowing for user-defined kernel functions leads to considerable added complexity, but not enough added benefit. Therefore I decided against it.

(And if somebody really needs an additional kernel, they can always derive their own subclass from this module, providing the new kernel in the implementation!)

Interesting Points as Array: In principle I like the idea, but the problem is the definition of "interesting". That really depends on what the user wants to do with the data! Also, evaluating either PDF or CDF is expensive, therefore I wanted to leave it to the user to determine the step-width for the iteration (if you don't need precision, you get it faster!).

Integration with PDL: That's an interesting idea. I need to look into that.

Again, good comments. Thanks a lot! I hope my replies make sense.

  • Comment on Re^2: RFC: Statistics::KernelEstimation - Kernel Density Estimates and Histograms