hi~ I have already read the chapter "Categorization and Extraction" in Advanced Perl Programming and almost get the idea how to do my work. But I still have some questions, I really wish you can help me to solve them, Here are the questions:
1.when I train categories, the code you wrote is
my $positive = {
word1 => 2,
word2 => 4,
word3 => 1,
};
is the code "word1 => 2" mean the number of times word1 appear in positive sentence? If I have 100 sentences taken as training sentence, then I need to produce hash for all the words in these sentence? Is there any easier way for me to train all the sentence?
2.in the book the author has invert the document into the hash of words and weights. using the code below:
sub invert_string {
my ($string, $weight, $hash) = @_;
$hash->{$_} += $weight for
grep { !$StopWords{$_} }
@{words(lc($string))};
}
But,I have already do the stem and stop word in advance so I think the code you wrote:
my $sentence1 = {
wordA => 2,
wordB => 1,
};
has the same function, is there any difference? If I have 100 training sentence, do I need to type all the different words in these training sentence using the code above?
3. If I have hundreds of sentence to make prediction, how can I invert all the sentence into hash variable?
4.I can not quite get the function of the code:
sub invert_item {
my $item = shift;
my %hash;
invert_string($item->{title}, 2, \%hash);
invert_string($item->{description}, 1, \%hash);
return \%hash;
}
is that true because I do not need to separate the weight of title and contend, so I can ignore this step?
5.here is the code in the book to train analyzer:
#!/usr/bin/perl
use XML::RSS;
use Algorithm::NaiveBayes;
use Lingua::EN::Splitter qw(words);
use Lingua::EN::StopWords qw(%StopWords);
my $nb = Algorithm::NaiveBayes->new( );
for my $category (qw(interesting boring)) {
my $rss = new XML::RSS;
$rss->parsefile("$category.rdf");
$nb->add_instance(attributes => invert_item($_),
label => $category) for @{$rss->{'items
+'}};
}
$nb->train; # Work out all the probabilities
I don't understand the function of:
my $rss = new XML::RSS;
$rss->parsefile("$category.rdf");
$nb->add_instance(attributes => invert_item($_),
label => $category) for @{$rss->{'items
+'}};
}
If I have ignore the invert_item step, what should I write to take place of "@{$rss->{'items'}}"
6. all the codes you wrote above should be written in one perl document or their have to be written separately and quote each other by name?
Your reply will be surely helpful. Thank you so much!!
|