in reply to Reg exps?

Is it safe to assume that the beginning (Acidovorax) and end (JS42) must match, but the middle (sp.) doesn't have to? If so, you can split on a space and build a regex to enact this.

Replies are listed 'Best First'.
Re^2: Reg exps?
by Anonymous Monk on Jan 14, 2010 at 16:07 UTC
    The "my_name" string is definitely included in the "db_name" string, but we can't know beforehand if the "db_name" string contains other characters or words as well. But we sure know that it contains all words from "my_name" string.

      I hope I did understand your problem. Here is my sample code. Kindly check it if the flow and output fits to the solution of your problem. If it didn't, I hope it can give you some clues. Just like Javafan said check each word of "my_name" if they all exist in one of the string of database "db_name".

      #!/usr/bin/perl -w @my_name=("Acidovorax","JS42"); #name to be searched... #...split by term/words @db_name=("Acidovorax sp. JS42", #data base of names in array "JS42 Acidovorax sp.", "JS42 sp. Acidovorax", "JS53 sp. Acidovorax", "JS42 sp. Axidovorax", "JS42Acidovorax sp. " ); my $ctr; #----compare strings in $db_name 1 at a time foreach my $db_each (@db_name){ #----search each term/word of @my_name in $db_each foreach($ctr=0; $ctr<=$#my_name; $ctr++) { #----if a term/word not found break the loop last if($db_each !~ /\b$my_name[$ctr]\b/i); } #----this will be true if inner foreach didn't break if($ctr==$#my_name+1) { print "$db_each\n"; #print the matched name } } <>;

      I have mention the "term/word" in the comment. It might be that two words in your "my_name" is considered as one term or two or more words separated by space.

      The code above does not support if a word should exist 2 or more times in any order. example: my_name = "high class bacteria f7-52 high fever". Just mentioning this condition for more flexibility to your program. It's still up to you.

        If I change the assignment to @my_name to
        @my_name=("Acidovorax", "sp.", "JS42");
        the program fails to find a single match. I doubt this is want the OP wants. The problem is that you're using different "word" definitions for "my_name" and "db_name". Either split both on whitespace, or both on \b, or it becomes trivial to find examples where the search is going to return the wrong answer. For instance:
        use 5.10.0; use strict; no warnings; my @names = ("Acidovorax JS42", "Acidovorax sp. JS42"); my @db_names = ("Acidovorax sp. JS42", #data base of names in array "JS42 Acidovorax sp.", "JS42 sp. Acidovorax", "Acidovorax JS42", "JS53 sp. Acidovorax", "JS42 sp. Axidovorax", "JS42Acidovorax sp. " ); foreach (@names) { say; my %name; @name{+split} = (); foreach (@db_names) { my %copy = %name; delete $copy{$_} for split; say "\t$_ is ", (keys %copy ? "not a " : "a "), "match"; } } __END__ Acidovorax JS42 Acidovorax sp. JS42 is a match JS42 Acidovorax sp. is a match JS42 sp. Acidovorax is a match Acidovorax JS42 is a match JS53 sp. Acidovorax is not a match JS42 sp. Axidovorax is not a match JS42Acidovorax sp. is not a match Acidovorax sp. JS42 Acidovorax sp. JS42 is a match JS42 Acidovorax sp. is a match JS42 sp. Acidovorax is a match Acidovorax JS42 is not a match JS53 sp. Acidovorax is not a match JS42 sp. Axidovorax is not a match JS42Acidovorax sp. is not a match
      This is more or less redundant after toastbread's response and quite possibly naive but I've done it now and it seems to do what the OP wanted:
      use strict; use warnings; my $my_name="Acidovorax JS42"; my $db_name="Acidovorax sp. JS42"; my @names = split " ", $my_name; my $numel = @names; #number of words to match my $count; foreach (@names) { if ($db_name=~/^|\s\Q$_\E$|\s/) { $count++; #count matches } } print "MATCH" if $count==$numel; #report match if all words match
      --->Updated to fix regex - thx JavaFan
        You would have to escape $_ as it may contain regexp special characters. It also assumes "words" are delimited by whitespace and by \b at the same time - your program fails to find a match if $my_name=$db_name="Acidovorax sp. JS42" for instance.