in reply to How to improve the accuracy of Lingua::NamedEntity ?
Update: Oh, and:
Update 2:
Okay, having said all the above - you could simply do something like this:
#!/usr/bin/perl -w use strict; use Lingua::EN::NamedEntity; $/ = undef; my $text = <DATA>; my @entities = extract_entities($text); my @unwanted_entities = qw( Monday Tuesday Wednesday Thursday Friday S +aturday Sunday); for (@entities) { my $entity = ${$_}{entity}; if ( grep { $_ eq $entity } @unwanted_entities ) { print "Skipping unwanted entity: $entity\n"; } else { print "Valid entity: $entity\n" } }
And the data for the above code was taken from a "recent BBC News story". The output is as follows:
Valid entity: Mr Murakami Valid entity: Takafumi Horie Valid entity: Singapore Valid entity: Mr Horie Valid entity: Societe General Asset Management Valid entity: Asset Management Skipping unwanted entity: Friday Valid entity: Tokyo Stock Exchange Valid entity: Livedoor Valid entity: Yoshiaki Murakami Valid entity: Murakami Valid entity: Tokyo Valid entity: International Trade and Industry Ministry Valid entity: Akio Yoshino
Cheers,
Darren :)
|
|---|