f77coder has asked for the wisdom of the Perl Monks concerning the following question:
Hello
I'm interested in recommendations for clustering with attributes of being fast over lightweight/small. So I'd prefer loops over one-liners if the loop can be executed faster. Now I'm looking through the large list of various CPAN archives (AI, Bayes, Cluster, etc) and would like narrow down the search. I don't mind getting the source code and having to hack if doesn't quite match what I need to do rather than having an expectation of something work as is.
The input data is a mixture of integers and strings, all categorical data. I'd like to look at each data line as an array and do vector processing, think of it as a 1d image processing problem, how many pixels are different.
For example,
line1=> cat1=123, cat2=92, cat3=5, cat4='0xffa411', cat5='0x221133', cat6='0xa291f1'
line2=> cat1=3, cat2=92, cat3=5, cat4='0xaf1401', cat5='0xaaffcc', cat6='0xa23af1'
I'd like to create a distance measurement based only on the number of categories that are different, in this case, the distance map would be (cat2,cat3,4). There will probably be a weighting function applied to this metric as well.
Once the training is complete then for a new line make a prediction with the classify/cluster.
Thanks
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Clustering/classifying recommendations
by Laurent_R (Canon) on Aug 19, 2014 at 20:24 UTC | |
by f77coder (Beadle) on Aug 20, 2014 at 03:51 UTC |