kulls has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
I'm going to use the  Statistics::Shannon module in my application.
Here I have computed simple perl file which compare the difference between two sets of data.
Please suggest me, Is that the correct way of comparing values using shannon index.
I will be greateful, If anyone can update me with more details.
Here is the link for the  Shannon Module for your quick reference.
What about the  evenness method in that module.Shall I need to consider that similarity of frequencies also ? in order to compute these differences ?
#!c:\perl\bin\perl use strict; use warnings; use Statistics::Shannon; use Data::Dumper; my $data1=[3,1,3,2,3]; my $data2=[4,1,3,2,3]; my $base_index=2; my $shannon1=Statistics::Shannon->new($data1,$base_index); my $shannon2=Statistics::Shannon->new($data2,$base_index); my $output1=$shannon1->index; my $output2=$shannon2->index; print $output1."\t".$output2; print "\tNot match" if( $output2 != $output1);
Thanks,
-kulls

Replies are listed 'Best First'.
Re: Shannon Index
by GrandFather (Saint) on Sep 04, 2006 at 19:15 UTC

    You should perhaps tell us more about your application and what it is that you wish to achieve using the Shannon index. I would think it rather unusual to test for equality.

    Note that $output1."\t".$output2 is probably better written "$output1\t$output2".


    DWIM is Perl's answer to Gödel
      Hi,
      Thanks for your suggestion.
      I want to find the similarity between $data1 with $data2 using this shannon implementation.
      I got an another view of finding similarites or information and that could be,
      #!c:\perl\bin\perl use strict; use warnings; use Statistics::Shannon; use Data::Dumper; my @data1=(1,3,2,4,6); my @data2=(2,3,2,3,6); my $data=[(@data1,@data2)]; my $base_index=2; my $shannon=Statistics::Shannon->new($data,$base_index); my $output=$shannon->index; print Dumper ($data); print "$output";

      suggest me if i'm wrong
      Thanks,
      -kulls

        You still haven't said anything about your application domain. From a Perl perspective there seem to be not hard choices to make - the issue is more to do with how you use the result than how you calculate it.


        DWIM is Perl's answer to Gödel