Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:


Hello friends, actually i wrote some scripts to build up the similarity matrix from a file(corpus). it should get each word n compare the similarity to others and the results can be saved in a file as matrix of NxN where N is the number of the words. i get some errors. any one can tell me where am i wrong:
#! /usr/local/bin/perl -w use strict; use warnings; use WordNet::QueryData; use WordNet::Similarity::random; use WordNet::Similarity::path; use WordNet::Similarity::wup; use WordNet::Similarity::lch; use WordNet::Similarity::jcn; use WordNet::Similarity::res; use WordNet::Similarity::lin; use WordNet::Similarity::hso; use WordNet::Similarity::lesk; use WordNet::Similarity::vector; use WordNet::Similarity::vector_pairs; my $Infile = shift; my $Outfile = shift; my $Measure = shift; unless (defined $Infile and defined $Outfile and defined $Measure) { print STDERR "Undefined input\n"; print STDERR "Usage: simmat.pl inputfile outputfile measure()\n"; exit 1; } print STDERR "Loading WordNet... "; my $wn = WordNet::QueryData->new; die "Unable to create WordNet object.\n" if(!$wn); print STDERR "done.\n"; open (FILE, "$Infile"); @words = <FILE>; #print $words[0]; close(FILE); for my $i (0 .. $#words) { for my $j ( ($i+1) .. $#words) { $sim[$i][$j] = similarity( $words[$i], $words[$j]) $sim[$j][$i] = $sim[$i][$j]; } } sub similarity { my ( $w1, $w2 ) = @_; my $sim = 1; my $obj = WordNet::Similarity::"$Measure"->new($wn); my $sim = $obj->getRelatedness($w1, $w2); return sim; } open (OUTPUT, ">$Outfile"); print OUTPUT @sim; close(OUTPUT);

I really appreciate ur helps.

Replies are listed 'Best First'.
Re: building similarity matrix
by pc88mxer (Vicar) on Feb 04, 2008 at 05:18 UTC
    This doesn't work:

    my $obj = WordNet::Similarity::"$Measure"->new($wn);

    A better solution is to put the entire package name in a variable:

    my $measure_class = "WordNet::Similarity::whatever"; my $measure = $measure_class->new(); ... sub similarity { ... return $measure->getRelatedness($w1, $w2); }

    Besides fixing the syntax error, this will also create the measure object only once at the beginning of your script.

    Also, note that using @words = <FILE> will leave the end of line characters on your strings, but maybe that's not a problem. If it is, just chomp your words:

    chomp(@words = <FILE>);

Re: building similarity matrix
by GrandFather (Saint) on Feb 04, 2008 at 04:20 UTC

    Telling us what error you get would be a good start. Providing a very small sample data set that demonstrates the error may help too.


    Perl is environmentally friendly - it saves trees

      dear friend, here is the error i get after running my codes like this: simmat.pl 1.txt 2.txt res
      Scalar found where operator expected at simmat.pl line 43, near ") $sim" (Missing operator before $sim?) Global symbol "@words" requires explicit package name at simmat.pl lin +e 36. Global symbol "@words" requires explicit package name at simmat.pl lin +e 40. Global symbol "@words" requires explicit package name at simmat.pl lin +e 41. Global symbol "@sim" requires explicit package name at simmat.pl line +42. Global symbol "@words" requires explicit package name at simmat.pl lin +e 42. Global symbol "@words" requires explicit package name at simmat.pl lin +e 42. syntax error at simmat.pl line 43, near ") $sim" Global symbol "@sim" requires explicit package name at simmat.pl line +43. Global symbol "@sim" requires explicit package name at simmat.pl line +43. Global symbol "@sim" requires explicit package name at simmat.pl line +57. Execution of simmat.pl aborted due to compilation errors.

      any idea where is the error from?
      I could figure out errors of global variables but how to solve the erroe of "Scalar found where operator expected"? anyway to solve it? tx

        pc88mxer has provided the answer at Re: building similarity matrix.

        You could set $Measure by:

        my $Measure = "WordNet::Similarity::" . shift;

        then do as pc88mxer suggests and create the measure object once outside the loop.


        Perl is environmentally friendly - it saves trees