in reply to Count number of occurrences of a list of words in a file

In the the three solutions that where posted, everyone uses 'exists $hash{key}' and a '$hash{key}=...' and to me these look like two look-ups in the hash to me. Is this efficient? Can this be more efficient?

Athanasius

++$count{$word} if exists $count{$word};

Tux

$cnt{$_}++ for grep { exists $cnt{$_} } m/(\w+)/g;

AnomalousMonk

exists $count{$_} and ++$count{$_} for $line =~ m{ $rx_word }xmsg;

Replies are listed 'Best First'.
Re^2: Count number of occurrences of a list of words in a file
by Cristoforo (Curate) on May 09, 2018 at 22:28 UTC
    Athanasius identified the code slowdown here. Hash lookups are the fastest way here and generally. The 2 lookups are necessary because you have to verify that the word being checked exists in the counting hash. Otherwise, without this check, a new word (not to be searched for) would be erroneously counted in the hash.
Re^2: Count number of occurrences of a list of words in a file
by AnomalousMonk (Archbishop) on May 09, 2018 at 22:27 UTC

    My assumption was that there might be many things in Azaghal's input textfile.txt that look like "words", and he or she only wanted to count the words specified in the list.txt file. If that's the case, one must check that a "word" exists before incrementing it else one will autovivify a "word" that was not previously present. Hence, two hash accesses are necessary.


    Give a man a fish:  <%-{-{-{-<