rahulruns has asked for the wisdom of the Perl Monks concerning the following question:

I am working with Lingua::Named::Entity and when providing text for named entity detection it outputs an array. With Dumper I am able to see data but how do I extract data from that. I need to extract class and entity both

use strict; use warnings; use Data::Dumper; use Lingua::EN::NamedEntity; my @entities = extract_entities("El Nino is the warm phase of the El N +iņo Southern Oscillation (commonly called ENSO) and is associated wit +h a band of warm ocean water that develops in the central and east-ce +ntral equatorial"); print Dumper (@entities); Output $ perl entity.pl $VAR1 = { 'count' => 1, 'scores' => { 'person' => 4, 'place' => 1, 'organisation' => 3 }, 'entity' => 'El Ni', 'class' => 'person' }; $VAR2 = { 'count' => 1, 'scores' => { 'organisation' => 2, 'person' => 9, 'place' => 1 }, 'entity' => 'El Nino', 'class' => 'person' }; $VAR3 = { 'class' => 'person', 'entity' => 'Southern Oscillation', 'scores' => { 'organisation' => 2, 'place' => 1, 'person' => 4 }, 'count' => 1 };

Replies are listed 'Best First'.
Re: extracting data from Lingua::Named::Entity
by Corion (Patriarch) on Jul 05, 2018 at 09:17 UTC

    You get at the information from the entries by accessing the array and then the fields of the hash (perldsc and/or tye's References Quick Reference):

    print $entities[0]->{ class }; print $entities[0]->{ entity }}

    But looking at the output, it seems that the module doesn't really understand the "ñ" in El Niño, as it suggests "El Ni" (from "El Nino Southern Oscillation" as an entity:

    $VAR1 = { 'count' => 1, 'scores' => { 'person' => 4, 'place' => 1, 'organisation' => 3 }, 'entity' => 'El Ni', 'class' => 'person' };

    I think as a quick (rough) fix, you can use Text::Unidecode to downgrade all your text to ASCII, removing all adornments. This would canonicalize different writings, for example "El Nino" and "El Niño", to "El Nino".

    The alternative fix would be to teach Lingua::EN::NamedEntity about Unicode, and/or to supply properly decoded data to it. I haven't looked into what is necessary for that.

      Thanks alot that works. For unicode will be looking into it