Hi monks,
I want to use AI::Categorizer for my research, but I'm not able to use it with the reuters21578 dataset. I'm tring to use the demo.pl file given with the module.What exactly should I have in the test set, training set and cats.txt file?
If any monks out there have used this module before,could you provide an example of any working code.
Thank you.

Demo.pl<br> #!/usr/bin/perl # This script is a fairly simple demonstration of how AI::Categorizer # can be used. There are lots of other less-simple demonstrations # (actually, they're doing much simpler things, but are probably # harder to follow) in the tests in the t/ subdirectory. The # eg/categorizer script can also be a good example if you're willing # to figure out a bit how it works. # # This script reads a training corpus from a directory of plain-text # documents, trains a Naive Bayes categorizer on it, then tests the # categorizer on a set of test documents. use strict; use AI::Categorizer; use AI::Categorizer::Collection::Files; use AI::Categorizer::Learner::NaiveBayes; use File::Spec; die("Usage: $0 <corpus>\n". " A sample corpus (data set) can be downloaded from\n". " http://www.cpan.org/authors/Ken_Williams/data/reuters-21578. +tar.gz\n". " or http://www.limnus.com/~ken/reuters-21578.tar.gz\n") unless @ARGV == 1; my $corpus = shift; my $training = File::Spec->catfile( $corpus, 'training' ); my $test = File::Spec->catfile( $corpus, 'test' ); my $cats = File::Spec->catfile( $corpus, 'cats.txt' ); my $stopwords = File::Spec->catfile( $corpus, 'stopwords' ); my %params; if (-e $stopwords) { $params{stopword_file} = $stopwords; } else { warn "$stopwords not found - no stopwords will be used.\n"; } if (-e $cats) { $params{category_file} = $cats; } else { die "$cats not found - can't proceed without category information.\n +"; } # In a real-world application these Collection objects could be of any # type (any Collection subclass). Or you could create each Document # object manually. Or you could let the KnowledgeSet create the # Collection objects for you. $training = AI::Categorizer::Collection::Files->new( path => $training +, %params ); $test = AI::Categorizer::Collection::Files->new( path => $test, %p +arams ); # We turn on verbose mode so you can watch the progress of loading & # training. This looks nicer if you have Time::Progress installed! print "Loading training set\n"; my $k = AI::Categorizer::KnowledgeSet->new( verbose => 1 ); $k->load( collection => $training ); print "Training categorizer\n"; my $l = AI::Categorizer::Learner::NaiveBayes->new( verbose => 1 ); $l->train( knowledge_set => $k ); print "Categorizing test set\n"; my $experiment = $l->categorize_collection( collection => $test ); print $experiment->stats_table; # If you want to get at the specific assigned categories for a # specific document, you can do it like this: my $doc = AI::Categorizer::Document->new ( content => "Hello, I am a pretty generic document with not much to + say." ); my $h = $l->categorize( $doc ); print ("For test document:\n", " Best category = ", $h->best_category, "\n", " All categories = ", join(', ', $h->categories), "\n");

In reply to AI::Categorizer help by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.