spanner has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I have a large array of strings and I would like to pull all strings from the array that contain multiple substrings. I've tried the following which does not work:
$largeArray[0] = "this is a test"; $largeArray[1] = "this is another test"; $largeArray[2] = "that is another test"; $largeArray[3] = "test this is another"; $searchCriteria[0] = "another"; $searchCriteria[1] = "test"; if (@newArray = grep {/@searchCriteria/} @largeArray) { print @newArray; }
This returns (incorrect):
this is another test
that is another test

So using the same @largeArray, I tried the following:
if (@newArray = grep {/another/ and /test/} @largeArray) { print @newArray; }
which returns (correct):
this is another test
that is another test
test this is another
My problem is that I cannot be certain how many elements will be in my @searchCriteria. Therefore it is not feasable to put a grep together in the latter form.
Any ideas or suggestions (or am I missing something fundamental)?

Thanks,
Brian

Replies are listed 'Best First'.
Re: grep for array-of-strings from an array-of-strings
by Enlil (Parson) on Apr 09, 2003 at 19:31 UTC
    you could just loop over the search Criteria doing a new grep each time (the last line is there to skip out once the @newArray is empty) ie.,
    use strict; use warnings; my @largeArray = ( "this is a test", "this is another test", "that is another test", "test this is another", ); my @searchCriteria = ("another", "test"); my @newArray = @largeArray; foreach my $criteria (@searchCriteria) { @newArray = grep { /$criteria/ } @newArray; die "Nothing Found\n" unless @newArray; } print join "\n",@newArray;

    update: Since $criteria can be many different things including chars that might need to be escaped you might want to change /$criteria/ to /\Q$criteria\E/

    update2: changed the ugly: last unless @newArray or die "Nothing Found\n"; to something nicer to look at. (die "Nothing Found\n" unless @newArray; )

    -enlil

Re: grep for array-of-strings from an array-of-strings
by nothingmuch (Priest) on Apr 09, 2003 at 19:24 UTC
    You might be able to get some results constructing a regex of look ahead assertions:
    my $test = construct_regex (@searcharray); my @results = grep { /$test/ } @longarray; sub construct_regex { my $regex = "^"; foreach my $subregex (@_){ $regex .= "(?=.*" . $subregex . ")"; } qr/$regex/; }


    -nuffin
    zz zZ Z Z #!perl
Re: grep for array-of-strings from an array-of-strings
by diotalevi (Canon) on Apr 09, 2003 at 19:40 UTC

    Updated Oops! I mistook any() for all(). Fixed thusly

    This is easy - your grep condition should return true if all of the search strings are found within the string being tested. In this case I created a function which tests a string against a criteria list and returns false if any don't match. Then I modified your grep so that it tests each element against all the criteria.

    sub indexall { -1 == index $_, $_[0] and return 0 for @_[1 .. $#_]; re +turn 1 } if (@newArray = grep indexall($_, @searchCriteria), @largeArray) { print @newArray; }
      I think he wants not 'any' but 'each'.
Re: grep for array-of-strings from an array-of-strings
by Limbic~Region (Chancellor) on Apr 09, 2003 at 23:55 UTC
    spanner,
    In the spirit of TIMTOWTDI, I provide the following code:
    #!/usr/bin/perl -w use strict; my @largeArray; $largeArray[0] = "this is a test"; $largeArray[1] = "this is another test"; $largeArray[2] = "that is another test"; $largeArray[3] = "test this is another"; my @searchCriteria; $searchCriteria[0] = "another"; $searchCriteria[1] = "test"; my $match_code = join(' and ', map "\$_[0] =~ /\Q$_\E/", @searchCriter +ia ); my $matcher = eval "sub { $match_code }; "; my @newArray = grep $matcher->($_), @largeArray; print "$_\n" foreach(@newArray);

    Let me explain my code as it may not appear to be self-explanatory:

  • my $match_code = a scalar variable that we will be building a sub-routine in
  • my $matcher = a way to evaluate the sub easily
  • The grep pulls out the matching entries from @largeArray

    I would be interested in seeing how this compares in a benchmark with your real data compared to some of the solutions others have provided.

    Cheers - L~R

    Update: Some great (cosmetic as well as performance) code changes as pointed out by diotalevi in the CB (and below)

      Fixed the regex

      Some cosmetic changes that make the code easier to work with.

      my $match_code = join(' and ', map "\$_[0] =~ /\Q$_\E/", @searchCriter +ia ); my $matcher = eval "sub { $match_code }; "; my @newArray = grep $matcher->($_), @largeArray;
Re: grep for array-of-strings from an array-of-strings
by spanner (Novice) on Apr 10, 2003 at 15:43 UTC
    Thanks to all. I implemented a regex/grep/eval hybrid based on L~R's submittal, and a few others. The key was to find string1 AND string2 in an array of strings, as "nothingmuch" pointed out. I'm now testing for (in)efficiencies now.

    Cheers,
    spanner
Re: grep for array-of-strings from an array-of-strings
by Improv (Pilgrim) on Apr 09, 2003 at 19:10 UTC
    You could use join to build your array into a single regex with or's, and then just do a single match against that.
      That'll match "this is a test". It needs to also contain 'another'. It's an AND boolean, not an OR boolean search...

      -nuffin
      zz zZ Z Z #!perl
        Oh.. if you want an AND, perhaps the look-ahead operator is what you want? I think it shouldn't be too hard to programmatically construct a regex with that style either..
      And here is some code for it: my $criteria = join "|", @searchCriteria; and then use it in the regex like that: /$criteria/

      Update: I realized that the 'and' case is much more difficult that the 'or' case that I presented. I would use eval for it:  my $criteria = join "/ and /", @searchCriteria; $criteria = "/$criteria/"; and then use it like this grep {eval $criteria}. Which is just your second code dynamically generated.