comment on

hi~ I have already read the chapter "Categorization and Extraction" in Advanced Perl Programming and almost get the idea how to do my work. But I still have some questions, I really wish you can help me to solve them, Here are the questions:

1.when I train categories, the code you wrote is

my $positive = {
    word1 => 2,
    word2 => 4,
    word3 => 1,
};
[download]

is the code "word1 => 2" mean the number of times word1 appear in positive sentence? If I have 100 sentences taken as training sentence, then I need to produce hash for all the words in these sentence? Is there any easier way for me to train all the sentence?

2.in the book the author has invert the document into the hash of words and weights. using the code below:

sub invert_string {
       my ($string, $weight, $hash) = @_;
       $hash->{$_} += $weight for
            grep { !$StopWords{$_} }
            @{words(lc($string))};
    }
[download]

But,I have already do the stem and stop word in advance so I think the code you wrote:

my $sentence1 = {
    wordA => 2,
    wordB => 1,
};
[download]

has the same function, is there any difference? If I have 100 training sentence, do I need to type all the different words in these training sentence using the code above?

3. If I have hundreds of sentence to make prediction, how can I invert all the sentence into hash variable?

4.I can not quite get the function of the code:

sub invert_item {
        my $item = shift;
        my %hash;
        invert_string($item->{title}, 2, \%hash);
        invert_string($item->{description}, 1, \%hash);
        return \%hash;
    }
[download]

is that true because I do not need to separate the weight of title and contend, so I can ignore this step?

5.here is the code in the book to train analyzer:

 #!/usr/bin/perl

    use XML::RSS;
    use Algorithm::NaiveBayes;
    use Lingua::EN::Splitter qw(words);
    use Lingua::EN::StopWords qw(%StopWords);

    my $nb = Algorithm::NaiveBayes->new(  );

    for my $category (qw(interesting boring)) {
        my $rss = new XML::RSS;
        $rss->parsefile("$category.rdf");
        $nb->add_instance(attributes => invert_item($_),
                          label      => $category) for @{$rss->{'items
+'}};
    }

    $nb->train; # Work out all the probabilities
[download]

I don't understand the function of:

 my $rss = new XML::RSS;
        $rss->parsefile("$category.rdf");
       $nb->add_instance(attributes => invert_item($_),
                          label      => $category) for @{$rss->{'items
+'}};
    }
[download]

If I have ignore the invert_item step, what should I write to take place of "@{$rss->{'items'}}"

6. all the codes you wrote above should be written in one perl document or their have to be written separately and quote each other by name?

Your reply will be surely helpful. Thank you so much!!

In reply to Re^2: how to use Algorithm::NaiveBayes module by agnes
in thread how to use Algorithm::NaiveBayes module by agnes

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.