I haven't read about the concept (yet!) so I'm just commenting on your code.
Copying and manipulating those hashes in max_diff is going to be slow, lots of memory copies, and if I've understood correctly you don't need to do it that way. Wouldn't something like this give you the number you need?
sub max_diff { ... my $count = 0; for (keys %{$hash1}) { $count++ unless exists $hash2->{$_}; } for (keys %{$hash2}) { $count++ unless exists $hash1->{$_}; } return $count;
In reply to Re: Optimizing a naive clustering algorithm
by RichardK
in thread Optimizing a naive clustering algorithm
by BUU
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |