I couldn't find a way to count the tags just using the Tagger module, but as the tagged text is in an XML style you could use an XML parser - this example uses
XML::LibXML. I initially tried to use
Text::Balanced but had to give up as I couldn't work out how to extract multiple variable tags. You will need a full list of the tags which you can build from the Tagger
README document.
use strict;
use warnings;
use Data::Dumper;
use XML::LibXML;
use Lingua::EN::Tagger qw(add_tags);
my $text = <<'EOT';
The set of POS tags used here is a modified version of the
Penn Treebank tagset. Tags with non-letter characters have been
redefined to work better in our data structures.
EOT
my $tagger = Lingua::EN::Tagger->new;
my $tagged = $tagger->add_tags($text);
my $tree = XML::LibXML->load_xml(string => "<doc>$tagged</doc>");
my @tags = qw(CC DET NN NNP RB VBZ); # add the rest
my %count;
for my $tag (@tags) {
my $lctag = lc($tag); # lowercase tag name
my @nodes = $tree->findnodes("//$lctag");
$count{$tag} = scalar @nodes;
}
print Dumper(\%count);
for my $tag (sort keys %count) {
print "$tag:\t$count{$tag}\n";
}
Output:
$VAR1 = {
'CC' => 0,
'VBZ' => 1,
'RB' => 2,
'NNP' => 3,
'NN' => 4,
'DET' => 3
};
CC: 0
DET: 3
NN: 4
NNP: 3
RB: 2
VBZ: 1
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.