in reply to extracting data from Lingua::Named::Entity

You get at the information from the entries by accessing the array and then the fields of the hash (perldsc and/or tye's References Quick Reference):

print $entities[0]->{ class }; print $entities[0]->{ entity }}

But looking at the output, it seems that the module doesn't really understand the "ñ" in El Niño, as it suggests "El Ni" (from "El Nino Southern Oscillation" as an entity:

$VAR1 = { 'count' => 1, 'scores' => { 'person' => 4, 'place' => 1, 'organisation' => 3 }, 'entity' => 'El Ni', 'class' => 'person' };

I think as a quick (rough) fix, you can use Text::Unidecode to downgrade all your text to ASCII, removing all adornments. This would canonicalize different writings, for example "El Nino" and "El Niño", to "El Nino".

The alternative fix would be to teach Lingua::EN::NamedEntity about Unicode, and/or to supply properly decoded data to it. I haven't looked into what is necessary for that.

Replies are listed 'Best First'.
Re^2: extracting data from Lingua::Named::Entity
by rahulruns (Scribe) on Jul 05, 2018 at 09:41 UTC

    Thanks alot that works. For unicode will be looking into it