comment on

The first thing you need to do is "train" the categorizer with known labels. For each category you create a hash using the words as keys and the weights as values - typically the weight would be the number of times the word occurs but you could use other criteria.

my $positive = {
    word1 => 2,
    word2 => 4,
    word3 => 1,
};
my $negative = {
    word4 => 3,
    word5 => 1,
};
[download]

It is a good idea to normalize each word to lower case and perhaps to stem them, and also to remove words that don't have any effect on the outcome. You then add these hashes to the categorizer:

my $categorizer = Algorithm::NaiveBayes->new;

$categorizer->add_instance(
    attributes => $positive,
    label => 'positive');

$categorizer->add_instance(
    attributes => $negative,
    label => 'negative');

$categorizer->train;
[download]

Then, for each of your sentences, you create a hash in a similar fashion and call predict() to find the probable classification of each sentence:

my $sentence1 = {
    wordA => 2,
    wordB => 1,
};

my $probability = $categorizer->predict(attributes => $sentence1);

if ($probability->{'positive'} > 0.5) {
    # sentence1 probably positive
}
elsif ($probability->{'negative'} > 0.5) { 
    # sentence1 probably negative
}
[download]

There is a section in the book Advanced Perl Programming - 2nd Edition entitled "Categorization and Extraction" that shows extended examples of using this module in conjunction with sentence splitters, stopword lists and stemmers.

In reply to Re: how to use Algorithm::NaiveBayes module by tangent
in thread how to use Algorithm::NaiveBayes module by agnes

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.