in reply to Re: RFC: Fuzzy Clustering with Perl
in thread RFC: Fuzzy Clustering with Perl

jhourcle

Thank you very much for your comments. I will work on a new version of the script, following your suggestions, and I will post it some time next week.

I will address some of your comments below.

"I'm going to guess that this is a straight transfer of a C program to Perl --"

This is a very good guess. In fact, it is true! I have been programming in Perl for three months now and I recognize that I have a long way to go. This is one of the reasons I am asking for comments. So, I can improve my Perl coding skills in the least amount of time possible.

"First, you'd using a whole lot of for loops for tracking indexes to arrays:"

that is true. I am trying to overcome that habit

"In Perl, if you're just trying to iterate over a range, you can use the 'foreach' style loop, with the range operator:
for my $i ( 0 .. $number_of_clusters-1 ) { ... }
"Even if we were doing this in C, for the type of loops you're dealing with (starting at 0, order of operations doesn't matter), I'd still change the code, to reduce the number of comparisons against non-0 values:"
for (i = number_of_clusters; i--; ) { ... }

I like the two options. However, maybe the first one is easier to understand for someone who is new to Perl. For the second one, you must have clear that in Perl the evaluation of $i is done first allowing the loop to continue and then the variable is decreased. This might be hard to see for someone new to the language (I had to try it to see what it did)

"Another change I might make is in how you deal with undefined values -- if the value must be defined, and can't be 0, (eg, $number_of_clusters), you can use the '||=' operator:"
$number_of_clusters ||= 2;

thank you for the pointer. Trying the ||= operator made me realize that the $number_of_cluster cannot be negative either. So maybe I should do

my $number_of_clusters = abs(shift @ARGV);

followed by the line you suggested. Is there another way around that?

"The only other thing is in how it's called -- if it were OO, you could inherit from it, and then replace the 'distance' function (or you could have it accept a coderef in for the distance function, if you didn't want to support inheritance), as some people prefer the manhatten distance when they're dealing with clusters:"

I have to think about this. I have to study OO in Perl, first.

Thanks again.

lin0

Replies are listed 'Best First'.
Re^3: RFC: Fuzzy Clustering with Perl
by Anonymous Monk on Nov 08, 2006 at 02:10 UTC
    Just out of curiosity, why are you implementing this in Perl? If the number of features goes beyond, say 5, for any reasonable dataset, this will be too slow to be of much use. I'd think this is the sort of thing you'd implement in C and then provide Perl bindings for...

      Hello

      Thank you for your comments. I will try to address them to the best of my knowledge

      “Just out of curiosity, why are you implementing this in Perl?”

      I am interested in developing a granular computing implementation using Perl. You can see this post I wrote on the topic. Clustering is an essential part of a granular computing implementation and because I could not find any previous implementation of Fuzzy C-means in Perl, I decided to write one (basically I just ported a code I had written in C to Perl). I also saw the opportunity of writing a Perl implementation of the Fuzzy C-means as a learning opportunity. I have been programing in Perl for three months so I decided this was a good starting project. Moreover, I need to gain a better understanding on how to program in Perl to be able to start my Granular Computing implementation. That is the final goal.

      "If the number of features goes beyond, say 5, for any reasonable dataset, this will be too slow to be of much use."

      That could certainly be the case. However, for the projects I am planning to use this for, I do not expect to have many more than 5 features. In any case, to make it more general, I will start thinking about how to speed up the processing.

      “I'd think this is the sort of thing you'd implement in C and then provide Perl bindings for...”

      This could be a good solution. In fact, I checked on CPAN and Algorithm::Cluster is implemented that way: as a Perl Interface to the C Clustering Library. That is something that I will certainly consider in the very near future

      Again, thank you for your comments

      Cheers!

      lin0