You get at the information from the entries by accessing the array and then the fields of the hash (perldsc and/or tye's References Quick Reference):
print $entities[0]->{ class }; print $entities[0]->{ entity }}
But looking at the output, it seems that the module doesn't really understand the "ñ" in El Niño, as it suggests "El Ni" (from "El Nino Southern Oscillation" as an entity:
$VAR1 = { 'count' => 1, 'scores' => { 'person' => 4, 'place' => 1, 'organisation' => 3 }, 'entity' => 'El Ni', 'class' => 'person' };
I think as a quick (rough) fix, you can use Text::Unidecode to downgrade all your text to ASCII, removing all adornments. This would canonicalize different writings, for example "El Nino" and "El Niño", to "El Nino".
The alternative fix would be to teach Lingua::EN::NamedEntity about Unicode, and/or to supply properly decoded data to it. I haven't looked into what is necessary for that.
In reply to Re: extracting data from Lingua::Named::Entity
by Corion
in thread extracting data from Lingua::Named::Entity
by rahulruns
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |