Bman70 has asked for the wisdom of the Perl Monks concerning the following question:

I've done a few hours of Googling and can't seem to find the solution.

I want to check if an array of single words (strings) contains several specific words, in any order.
I have this code, cobbled together from advice found online:

my %hash = @allwords; #puts array in hash to search / compare. if (exists $hash{'word1' && 'word2' && 'word3'}) { ...do stuff ; }

I want the if to be true if all the words are found, in any order. But it's returning true
if just word3 is found. I think it's ignoring word1 and word2.
I'm too new at Perl to know if it's even appropriate syntax, although it doesn't give any error currently.
I've also found a technique using $_ eq, but it doesn't seem to be matching the way I want:

if ( grep { $_ eq 'word1' && 'word2' } @allwords ) { ...do stuff; }

Keep in mind, I never found any solution to match several values; so this && tactic is a semi-educated guess.

p.s. I don't know why the code format is so tiny in the preview.. I can hardly read the code. But hopefully
it's not that way when I post it.

Replies are listed 'Best First'.
Re: Check array for several matches
by 1nickt (Canon) on May 10, 2017 at 22:25 UTC

    See the core module List::Util, especially all.

    use strict; use warnings; use List::Util 'all'; # see below for how to properly create your hash my %words = ( foo => 1, bar => 1, baz => 1, qux => 1, ); my @must_exist = ( 'bar', 'qux', ); my $all_found = all { exists $words{ $_ } } @must_exist; if ( $all_found ) { ... }

    As for the code you have:

    • my %hash = @allwords; #puts array in hash to search / compare.
      ... yes, but using pairs. So an array of ('foo', 'bar', 'baz', 'qux') will give a hash with only two key-value pairs, with keys foo and baz. Thus later searching on the keys will not give the desired results.

      If you do want to create a hash with a key for each element in a list, use map:

      @array = ('foo', 'bar', 'baz', 'qux'); %hash = map { $_ => 1 } @array;

    • if (exists $hash{'word1' && 'word2' && 'word3'}) { ...do stuff ; }
      ... this won't work. You need to check whether each key exists individually:
      if ( exists $hash{'foo'} and exists $hash{'bar'} ) { ... }

    • if ( grep { $_ eq 'word1' && 'word2' } @allwords ) { ...do stuff; }
      ... this has several problems, which I will let other monks explain, but all of which can be avoided by using all.

    Hope this helps!

    update: added comments on OP code, then example with all


    The way forward always starts with a minimal test.
Re: Check array for several matches
by tobyink (Canon) on May 10, 2017 at 22:41 UTC
Re: Check array for several matches
by kcott (Archbishop) on May 11, 2017 at 06:48 UTC

    G'day Bman70,

    Welcome to the Monastery.

    my %hash = @allwords; #puts array in hash to search / compare.

    That's not doing what you think. You'll end up with a hash that looks something like this:

    ( 'word1' => 'word2', 'word3' => undef, )

    Here's an example (see Data::Dump, which exports the dd function, if you're unfamiliar with that module; see perlrun for '-M' and all other command switches):

    $ perl -MData::Dump -e '@a = qw{b c a}; %h = @a; dd \%h' { a => undef, b => "c" }

    The usual idiom uses map:

    my %hash = map { $_ => 1 } @allwords;

    Extending that first example:

    $ perl -MData::Dump -e '@a = qw{b c a}; %h = map { $_ => 1 } @a; dd \% +h' { a => 1, b => 1, c => 1 }
    if (exists $hash{'word1' && 'word2' && 'word3'}) { ...

    That's not working because "'word1' && 'word2' && 'word3'" evaluates to "'word3'"; your condition effectively becomes "exists $hash{'word3'}". Here's an example of that:

    $ perl -E 'my $x = "a" && "b" && "c"; say $x' c
    if ( grep { $_ eq 'word1' && 'word2' } @allwords ) { ...

    That suffers from much the same problem as the previous point:

    $ perl -E '$_ = "a"; my $x = $_ eq "a" && "b"; say $x' b

    A typical method of determining how Perl evaluates your code is with B::Deparse. It's usually run from the command line; using the "-p" option is often useful, and seeing the results with and without this option can often provide additional insight. Here's how Perl sees your grep code:

    $ perl -MO=Deparse -e 'grep { $_ eq "word1" && "word2" } @a' grep {'word2' if $_ eq 'word1';} @a; -e syntax OK $ perl -MO=Deparse,-p -e 'grep { $_ eq "word1" && "word2" } @a' grep({(($_ eq 'word1') and 'word2');} @a); -e syntax OK

    I suggest you read through "perlintro - Perl introduction for beginners". It will help you with some of the basic mistakes you've made. It also has many links to more detailed documentation and advanced topics: this makes it a handy reference for future (repeat) use.

    That covers all the issues you had with your various attempts at a solution. I see a number of working solutions have already been provided. Here's how I might have solved this (assuming all the elements of your array are unique) using grep in scalar context.

    $ perl -E 'my @x = qw{b c a}; say scalar grep { /^(?:a|b|c)$/ } @x' 3 $ perl -E 'my @x = qw{x y z}; say scalar grep { /^(?:a|b|c)$/ } @x' 0 $ perl -E 'my @x = qw{x c y a b z}; say scalar grep { /^(?:a|b|c)$/ } +@x' 3

    You can force scalar context with an operator; allowing you to write your if condition like this:

    my $regex = ...; if (@matchwords == grep { /$regex/ } @allwords) { ...

    In the examples, I gave the regex as an alternation (/^(?:a|b|c)$/) for illustrative purposes. I assume your wordN values are just test data. For your real code, a similar alternation may be appropriate; however, something like /^word[1-3]$/ or /^word\d+$/ may be better. If you think the alternation is the best choice for you, consider creating $regex dynamically from @matchwords: see "Building Regex Alternations Dynamically" for details of how to do this.

    Finally, when choosing your solution, you may find Benchmark useful.

    — Ken

Re: Check array for several matches
by Marshall (Canon) on May 11, 2017 at 01:55 UTC
    I tried to write some straightforward code for you. Pay attention to the algorithm (the method of determining the result). Nothing really "fancy" is required here, just logical thinking about how to determine a successful result. "map" is just a kind of a foreach loop. Don't worry about that until you understand the simple loops below.
    #!/usr/bin/perl use strict; use warnings; my @requiredMatches = ('word1', 'word2', 'word3'); my @testArray = qw (word2 abc word3 xyz word2 asdf word1 word2); my %requiredMatches; # set up a hash table to count occurences of # each word that is required to be there. # Start with "zero", haven't seen it yet. foreach my $requiredWord (@requiredMatches) { $requiredMatches{$requiredWord}=0; } # Now for each word in the input @testArray, increment it's # "seen" count, if and only if it is one of the words # that is being "tracked" meaning that a hash entry exists in # %requiredMatches for that word, i.e., $requiredMatches{$word} # # If you don't check for "exists", Perl will happily create a new # hash entry that is incremented. We don't want that here. # We only want the %requiredMatches hash to contain only the # "must have" words. Update: Well the "if" is not absolutely # required below because we are only going to count "zeroes", # but I prefer this formulation that doesn't generate unnessary # hash entries. foreach my $word (@testArray) { $requiredMatches{$word}++ if exists $requiredMatches{$word}; } # If every hash table entry got incremented (no zero initial # values left), then every required Word was seen at least once. my $countWordMissing=0; foreach my $requiredWord (keys %requiredMatches) { $countWordMissing++ if ($requiredMatches{$requiredWord} == 0); } print "testArray contains all words\n" if $countWordMissing==0;

      My approach was almost the same as yours except I used a function so I could short circuit and return as soon an item was determined to be missing.

      use warnings; use strict; use Data::Dumper; my @allwords = qw( mary had a little lamb its fleece was white as snow + ); my @checkwords = qw( little lamb fleece everywhere ); print "All words were ", check_array_for_words(\@allwords, \@checkword +s) ? "" : "not ", "found.\n"; sub check_array_for_words { my ($ar_allwords, $ar_checkwords) = @_; my %hash; foreach (@$ar_allwords){ $hash{$_} = undef; } print Dumper(\%hash); foreach (@$ar_checkwords) { if (not exists $hash{$_}) { return 0; } } return 1; } __END__ $VAR1 = { 'had' => undef, 'lamb' => undef, 'a' => undef, 'fleece' => undef, 'little' => undef, 'white' => undef, 'as' => undef, 'was' => undef, 'mary' => undef, 'snow' => undef, 'its' => undef }; All words were not found.
        Yes, this will certainly work! I made an implicit assumption that there were very few check words (perhaps just 3) to be checked against perhaps a potentially very big list of other words. So I restricted my hash to be only the 3 checked words instead of all_words. But I think your code is just fine.

        I don't think it is necessary to beat this thing to death in all its variations. I don't know whether I actually succeeded or not, but I was attempting to address the OP's comments about searching the internet for hours and copying code that he didn't understand and that didn't work for him.

        I avoided map{} and grep{} (which are special kinds of foreach loops) and attempted to write very simple loops in the hope that the OP will take the time to understand them. I added a whole lot of verbiage in an attempt to explain an example of the algorithmic thought process. I am hoping that the OP learned not only a solution to this current problem, but also stuff that is helpful to future problems.

Re: Check array for several matches
by ankitpati (Sexton) on May 11, 2017 at 13:38 UTC

    A shorter, more portable solution that uses only Perl built-ins.

    Also, since you are starting out with Perl, knowledge of these techniques will be essential in your journey to Perl greatness!

    Upto my %hash should be clear from the previous responses.

    Enhancement (thank you, AnomalousMonk):

    The deduplication code which I previously thought would be necessary is not actually required.

    The next statement prints do stuff if the specified condition is met. Let us look at each part of the condition.

    Evaluated in scalar context, arrays in Perl return the number of elements they contain. So @required returns the number of elements it contains.

    @hash{@required} uses the hash-slice syntax to extract a list of values associated with the keys that happen to be members of @required. Trying to extract a value against a key which does not exist in a hash results in an undefined value.

    Enhancement:

    The defined test is not really required, as it is implicit.

    grep { $_ } returns a list of all defined elements from the list previously returned. Recall that non-existent keys return an undefined value, which evaluates to false in the boolean context, so they are weeded out by this.

    Enhancement (thank you, haukex):

    In scalar context, grep returns the number of elements in the list that match the condition. Since that is what we need, we just use the value in scalar context directly.

    I leave figuring out the remainder of the code as an exercise to the reader.

    #!/usr/bin/env perl use strict; use warnings; my @allwords = qw( this is a list of words that is required for tests +); my @required = qw( this is a list of words that is required ); my %hash = map { $_ => 1 } @allwords; print "do stuff\n" if @required == grep { $_ } @hash{@required}; my @notthere = qw( this list is not there in allwords ); print "don't do stuff\n" unless @notthere == grep { $_ } @hash{@notthere};

    I know this could get a bit overwhelming, so feel free to ask for any clarifications.

      The surrounding box-brackets [ ] convert the previously returned list into an ARRAY reference, and the @{ } construct converts the ARRAYref into an array. This is necessary because an array is different from a list. A list does not return the number of elements it contains in scalar context, but an array does, ...

      While all of that is correct, note that grep in scalar context returns the number of times the expression was true, so the @{ [ ... ] } isn't necessary:

      print "do stuff\n" if @required == grep { defined } @hash{@required};

      In addition to haukex's comments here, I would say that the
          @required = keys %{{ map { $_ => 1 } @required }};
      and
          @notthere = keys %{{ map { $_ => 1 } @notthere }};
      statements are unnecessary. All they do is rearrange the order of elements in the respective arrays in a more-or-less random way, and this rearrangement has no useful effect.

      c:\@Work\Perl>perl -wMstrict -le "my @allwords = qw(this is a list of words that is required for tests) +; my @required = qw(this is a list of words that is required); ;; my %hash = map { $_ => 1 } @allwords; ;; print 'do stuff' if @required == grep { defined $_ } @hash{@required}; ;; my @notthere = qw(this list is not there in allwords); ;; print 'do not do stuff' unless @notthere == grep { defined $_ } @hash{@notthere}; " do stuff do not do stuff


      Give a man a fish:  <%-{-{-{-<

Re: Check array for several matches
by Lotus1 (Vicar) on May 12, 2017 at 15:03 UTC

    You are looking for something to check if one list is a subset of another one. List::Compare has functions that do that. You pass two lists to the constructor then you can call a method to find if the list on the right is a subset of the one on the left or vice versa. Take note that you use the backslash to pass a reference of the array. If you try to pass two lists to a function they get mixed up.

    use warnings; use strict; use List::Compare; my @allwords = qw( mary had a little lamb its fleece was white as snow + ); my @checkwords1 = qw( little lamb fleece everywhere ); my @checkwords2 = qw( little lamb fleece ); my $lc1 = List::Compare->new(\@allwords, \@checkwords1); my $lc2 = List::Compare->new(\@allwords, \@checkwords2); ## See if the list on the Right, @checkwords, ## is a subset of the list on the left, @allwords. print "Check 1: All words were ", $lc1->is_RsubsetL ? "" : "not ", "fo +und.\n"; print "Check 2: All words were ", $lc2->is_RsubsetL ? "" : "not ", "fo +und.\n"; __END__ Check 1: All words were not found. Check 2: All words were found.