joni has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I want to use the module Clusterize, but I have problems when trying it. When I type the code:

use Clusterize; my $clusterize = Clusterize->new(); $clusterize->add_pair(1,['1', '2', '3' ]); $clusterize->add_pair(2,['4', '6', '5']); $clusterize->add_pair(3,['1', '2', '3', '2', '3']); $clusterize->add_pair(4,['1', '2', '3' ]); $clusterize->add_pair(5,['4', '6', '5']); $clusterize->add_pair(6,['1', '2', '3', '2', '3']); $clusterize->add_pair(7,['1', '2', '3' ]); $clusterize->add_pair(8,['4', '6', '5']); $clusterize->add_pair(9,['a', 'b', 'c', 'd', 't']); my @clusters = $clusterize->list; print scalar(@clusters);

The result is 0. So there is no element in the clusters array. Can anyone help me how to use this module?

Replies are listed 'Best First'.
Re: How can I use Clusterize
by toolic (Bishop) on Apr 21, 2011 at 13:28 UTC
    When I'm trying to find out how a CPAN module works, I go to its webpage: Clusterize.

    Since its POD is a little sparse, let's look for code samples under an "examples" link (or something like it) under the "MANIFEST" link. Well, there are no example scripts for this module.

    Another place to look for code samples is under the test directory ("t"). Great... I see 4 tests. Let's take a look inside of them. Drat! Nothing there either. The furthest any test goes is to call the constructor.

    Perhaps this module doesn't actually do what you expect it to do. You will have to prove that to yourself by looking into its source code.

Re: How can I use Clusterize
by wind (Priest) on Apr 21, 2011 at 13:44 UTC

    Clusterize does not appear to be a finished module.

    Looking at the source for Clusterize->addpair shows that it doesn't actually do anything:

    sub add_pair { my ($self, $key, $digest) = @_; return if $self->pair($key); $digest = Clusterize::Pattern->text2digest($digest) if ref $digest eq 'ARRAY'; $self->pair($key, $digest); for (keys %{$digest}) { $self->add_cluster_pair($_, {key => $key, val => $digest->{$_} +}); } }

    Clusterize::Pattern's text2digest($digest) basically returns an empty hash ref {}, therefore all of the string data is lost and nothing gets initialized for your key at all.

    As others have said, just state what you'd like to actually do and someone can probably help you, but this module is not going to work

Re: How can I use Clusterize
by ww (Archbishop) on Apr 21, 2011 at 12:04 UTC
    You could read the doc again.

    Your line 14 doesn't appear to match the example in the docs ...and the code you show leaves me (my iggerance, perhaps?) unclear about the purpose to which you're trying to put the module.

    Update: Re line 14 and from the doc: my @clusters = $clusterize->list();
    Also, fixed the link.

      Thnx for your answer. I want to use this module for text clustering. But taking the simplest code that I put, it seems to me like it doesn't do the job. Or maybe I'm understanding it.
Re: How can I use Clusterize
by believer (Sexton) on Apr 21, 2011 at 12:16 UTC
    Your code seems ok to me, but Clusterize seems a bit obscure. Maybe we can help you find another way to solve the underlying problem?
      Yes, I would appreciate your help. Actually, I'm trying to get news articles from different sources and cluster them according to their content (similar to GoogleNews). This is my code for getting the contenst of news articles:
      use WWW::Newsgrabber; use LWP::Simple; use HTML::ContentExtractor; use LWP::UserAgent; $kot_tekst='HASH|albania.htmlcomments|lajmet|free_web_stats|index.html +$|ne.html$|arkivi.html$|IMG|html#c'; my $extractor = HTML::ContentExtractor->new(); my $agent=LWP::UserAgent->new; $dirname = "C:\\Users\\Administrator\\Desktop\\corpus"; my $j=1; $obj[0]= WWW::Newsgrabber->new( url => 'http://www.shekulli.com.al/biz +nes/', regex => '\.html' ); $obj[1] = WWW::Newsgrabber->new( url => 'http://www.gazeta-shqip.com/# +/ekonomi', regex => '\.html' ); foreach $item (@obj){ my $ResultHashRef = $item->getNews(); while ( my ($url,$name)=each(%{$ResultHashRef})){ if ($url !~ /$kot_tekst/){ $counter++; my $res=$agent->get($url); my $HTML = $res->decoded_content(); $extractor->extract($url,$HTML); $c= $extractor->as_text(); $c =~ m/ KOMENTE/g; $c=substr($c,1,pos($c)-7); $hash_biznes{$url}=$c; }}};
      This works fine, but as I said, I want now to cluster different articles, to find the similar ones.