comment on

Here is the new batch, I have used podchecker as you told me, now I dont have any more warnings.
I have also rewritten the Synopsis section and the description section putting everything in its own place.
It does look nicer on a perldoc.
Eager for feedback and ways to get better, as always.

package Security::Monitoring::Detection::Markov;

use 5.006;
use strict;
use warnings;
use Storable qw(nstore retrieve);
use Carp;
use Security::Monitoring::Utils;
use parent qw(Security::Monitoring::Detection::Detector);

our $dict_file = '/usr/share/dict/words';
=head1 NAME

Markov chain based automata class

=head1 VERSION

Version 0.01

=cut

our $VERSION = '0.01';


=head1 SYNOPSIS

my $class = 'Security::Monitoring::Detection::Markov';
my $automaton = $class->new($alert_level);
$automaton->sys_learn;#learns from the system dict file

$automaton->learn('filename');#learn from a file

$automaton->learn('word1\nword2\nword3',1);#learn from a string

$automaton->get_score('word');#get score for word

$automaton->flag_as_false_positive('word');#set the automaton to retur
+n the
mean score when asked about word

my $false_positives_ref = $automaton->get_fase_positives;
foreach my $fp (@{$false_positives}){
    #do things to the false positive string returned
}



=head1 DESCRIPTION


This module allows you to summon Markov chain based automata
That, after being given a dictionary to learn from will
give marks to words based on how likely these words are to 
be part of the language used in the dictionary they learnt.

about the alert level, I have an intuition that words of a dictionary 
+should
more or less follow a normal distribution around the mean, so you migh
+t want
to set the alert level as a multiple of the standard deviation.


=head1 SUBROUTINES/METHODS


new, use, learn, sys_learn, flag_as_false_positve

=head2 new

Summoning method.
give it a number (it will be a percentage) to determine
when the automaton will trigger an alert. This percentage is the devia
+tion
score from the mean of all learnt words.

eg 10% with a mean score of all words learnt of 1 will make the automa
+ton
trigger an alert if after examining a word it obtains a score that is 
+greate
or lower than his mean by at least 10%


=cut

sub new {
    my $class = shift;
        my $alert_level = shift;
        my $self = {};
    bless $self, $class;
        $self->_init($alert_level);
    $self;
}


=head2 _init

separate init function to make inheritance easier
=cut
sub _init{
    my $self = shift;
    my $alert_level = shift;
    if( !defined($alert_level) || !($alert_level =~ /\d+/)){
        croak "alert leve undefined!\n";
    }
    my @keys = qw(t_number word_count mean_score t_table_ref alert_lev
+el
    false_positives);
    my %t_table;#hash that will later contain the transitions
    my %false_positives;#hash that will contian the false positives
    foreach my $key (@keys){
        $self->{$key} = 0;
    }
    $self->{t_table_ref} = \%t_table;
    $self->{false_positives} = \%false_positives;
}

=head2 learn

this method is used by an automaton to look through a dictionary in th
+e form of
multiple lines. beware that telling an already educated automaton to l
+earn will
change whatever results you obtain

this function will take a filehandle to the dictionary file (one word 
+per line)
and will proceed to populate the automaton t_table with it.

each object has a transition table wich is in face made of a number of
+ hash
table : first level one has one key for each element of the alphabet a
+nd another
one for the empty word, each value is a reference to another hashtable
containing as keys the letter to which the transition is mad and as va
+lue the
number of transitions made to that key

the last parameter is a boolean, leave it empty to indicate its a file
+ and not
just a string.



=cut

sub learn {
    my ($self,$file_name,$is_string) = @_;
    my $fh;
    my $hash = $self->{t_table_ref};
    my $words_learnt = 0;

        #chooses whether we ar elearning from a string or a file
    if(!defined($is_string)){
                if (!defined($file_name)){
                    croak("using an undefined filename\n");
                }
        open $fh, '<',$file_name or croak qq(could not open dictionary
+ file
        $file_name :$!\n);
    }
    else{
                if (!defined $file_name){
                    croak "sorry, that string is undefined, can not re
+ad from
                    it\n";
                }
                elsif ($file_name eq ''){
                    croak "sorry, that string is empty, nothing to rea
+d!\n"
                }
        open $fh, '<',\$file_name;
    }
        #reads each word and update the transitiontable with it
    while(<$fh>){
        my $word = $_;
        chomp $word;
        $word =~ s/(\w)/lc($1)/ge;#put it in full lowercase
                $word =~ s/[^[[::alphanum::]]]//;#gets rid of non alph
+anumeric
                                                #symbols
        my @letters = split '',$word;
        my $last_letter = 'empty';#first iteration starts from the emp
+ty
                #letter
        for(my $i = 0; $i <= $#letters;$i++){
            my $cur_letter = $letters[$i];

                        #if there is no subkey in the transition table
+ then we start
                        #with one which value is set at 1
            $hash->{$last_letter}->{$cur_letter}  
                        = defined $hash->{$last_letter}->{$cur_letter}
+ 
                        ? $hash->{$last_letter}->{$cur_letter}++      
+:        1; 

            $self->{t_number}++;
            $last_letter = $cur_letter;
        }
        $words_learnt++;
    }
        my $total = 0;
        seek $fh,0,0;
        while (<$fh>){
            my $word = $_;
            chomp $word;
            $word =~ s/(\w)/lc($1)/ge;#put it in full lowercase
            $word =~ s/[^[[::alphanum::]]]//;#gets rid of non alphanum
+eric
            $total += $self->get_score($word);
        }
            $self->{mean_score} =
                (($self->{mean_score} * $self->{word_count})
                +($total * $words_learnt))
                / ($self->{word_count} + $words_learnt);
    
            $self->{word_count}+=$words_learnt;

    close $fh;
    return $words_learnt;
}

=head2 _transitions

getter for t_number field
=cut
sub _transitions{
    my $self = shift;
    $self->{t_number};
}

=head2 get_score

This method is used once the automaton has learnt enough words (enough
+ being
left to the judgement of the user, dictionary files are a good source 
+I reckon)
using it it will give you a score that will determine wether that lett
+er
ordering is unusual (the higher the score the most unusual it will be)
+. The
score should be around 0 for known words, normal transitions such as c
+onsumn to
vowels should not add too much to the score. strange ones( such as
xxxbxczbbwdx)will do. Keep in mind that a completely unknown transitio
+n
will give one point, anything else will give a score calculated using 
+the
following formula : 

1 - (number of time this transition already occured / total of transit
+ions
recorded);



=cut

sub get_score{
    my ($self,$word) = @_;
    my $score = 0;
    chomp $word;
    $word =~ s/(\w)/lc($1)/ge;
        $word =~ s/[^[[::alphanum::]]]//;#gets rid of non alphanumeric
                                        #symbols

        if(defined $self->{false_positives}->{$word}){
            return $self->{mean_score};
        }

    my @letters = split('',$word);
    my $last_letter = "empty";

    foreach my $l (@letters){
        $score 
                += defined $self->{t_table_ref}->{$last_letter}->{$l}?
            $self->{t_table_ref}->{$last_letter}->{$l} / $self->{t_num
+ber}:    1;
        $last_letter = $l;
    }    
    return $score;
}


=head2 flag_as_false_positive

    this function allows you to flag a specific word as a false positi
+ve
    afterward whenever the autmaton tries to calculate the score of th
+e word
    it will return his mean score to prevent triggering an alert

=cut

sub flag_as_false_positive{
    my($self,$word) = @_;
    $word =~ s/(\w)/lc($1)/ge;
    $word =~ s/[^[[::alphanum::]]]//;#gets rid of non alphanumeric
                                    #symbols
    $self->{false_positives}->{$word} = 1;
}

=head2 get_false_positives

    returns a reference to a list of flagged false positives from the 
+automata
=cut

sub get_false_positives{
    my $self = shift;
    my @false_positives = keys %{$self->{false_positives}};
    return \@false_positives;
}
=head2 sys_learn

this subroutines uses the learn function on the system's /usr/share/di
+ct/words
file if it exists. if it does not it will return 0. If it does it will
+ return
the number of learnt words

if the file is empty it will return -1;

=cut

sub sys_learn{
    my $self =  shift;
    if (!ref $self){
        croak("can not syslearn as class!\n");
    }
    my $file_name = $dict_file;
    if (-e $file_name){
        my $learnt = $self->learn($file_name);
        return $learnt;
    }
    else{
       croak "bad package var dict_file\n";
    }
}

1; # End of Markov
[download]

In reply to Re^4: Markov Chain automata class by QuillMeantTen
in thread Markov Chain automata class by QuillMeantTen

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.