in reply to Re: Reg exps?
in thread Reg exps?

The "my_name" string is definitely included in the "db_name" string, but we can't know beforehand if the "db_name" string contains other characters or words as well. But we sure know that it contains all words from "my_name" string.

Replies are listed 'Best First'.
Re^3: Reg exps?
by toastbread (Novice) on Jan 14, 2010 at 18:08 UTC

    I hope I did understand your problem. Here is my sample code. Kindly check it if the flow and output fits to the solution of your problem. If it didn't, I hope it can give you some clues. Just like Javafan said check each word of "my_name" if they all exist in one of the string of database "db_name".

    #!/usr/bin/perl -w @my_name=("Acidovorax","JS42"); #name to be searched... #...split by term/words @db_name=("Acidovorax sp. JS42", #data base of names in array "JS42 Acidovorax sp.", "JS42 sp. Acidovorax", "JS53 sp. Acidovorax", "JS42 sp. Axidovorax", "JS42Acidovorax sp. " ); my $ctr; #----compare strings in $db_name 1 at a time foreach my $db_each (@db_name){ #----search each term/word of @my_name in $db_each foreach($ctr=0; $ctr<=$#my_name; $ctr++) { #----if a term/word not found break the loop last if($db_each !~ /\b$my_name[$ctr]\b/i); } #----this will be true if inner foreach didn't break if($ctr==$#my_name+1) { print "$db_each\n"; #print the matched name } } <>;

    I have mention the "term/word" in the comment. It might be that two words in your "my_name" is considered as one term or two or more words separated by space.

    The code above does not support if a word should exist 2 or more times in any order. example: my_name = "high class bacteria f7-52 high fever". Just mentioning this condition for more flexibility to your program. It's still up to you.

      If I change the assignment to @my_name to
      @my_name=("Acidovorax", "sp.", "JS42");
      the program fails to find a single match. I doubt this is want the OP wants. The problem is that you're using different "word" definitions for "my_name" and "db_name". Either split both on whitespace, or both on \b, or it becomes trivial to find examples where the search is going to return the wrong answer. For instance:
      use 5.10.0; use strict; no warnings; my @names = ("Acidovorax JS42", "Acidovorax sp. JS42"); my @db_names = ("Acidovorax sp. JS42", #data base of names in array "JS42 Acidovorax sp.", "JS42 sp. Acidovorax", "Acidovorax JS42", "JS53 sp. Acidovorax", "JS42 sp. Axidovorax", "JS42Acidovorax sp. " ); foreach (@names) { say; my %name; @name{+split} = (); foreach (@db_names) { my %copy = %name; delete $copy{$_} for split; say "\t$_ is ", (keys %copy ? "not a " : "a "), "match"; } } __END__ Acidovorax JS42 Acidovorax sp. JS42 is a match JS42 Acidovorax sp. is a match JS42 sp. Acidovorax is a match Acidovorax JS42 is a match JS53 sp. Acidovorax is not a match JS42 sp. Axidovorax is not a match JS42Acidovorax sp. is not a match Acidovorax sp. JS42 Acidovorax sp. JS42 is a match JS42 Acidovorax sp. is a match JS42 sp. Acidovorax is a match Acidovorax JS42 is not a match JS53 sp. Acidovorax is not a match JS42 sp. Axidovorax is not a match JS42Acidovorax sp. is not a match
Re^3: Reg exps?
by fod (Friar) on Jan 14, 2010 at 18:48 UTC
    This is more or less redundant after toastbread's response and quite possibly naive but I've done it now and it seems to do what the OP wanted:
    use strict; use warnings; my $my_name="Acidovorax JS42"; my $db_name="Acidovorax sp. JS42"; my @names = split " ", $my_name; my $numel = @names; #number of words to match my $count; foreach (@names) { if ($db_name=~/^|\s\Q$_\E$|\s/) { $count++; #count matches } } print "MATCH" if $count==$numel; #report match if all words match
    --->Updated to fix regex - thx JavaFan
      You would have to escape $_ as it may contain regexp special characters. It also assumes "words" are delimited by whitespace and by \b at the same time - your program fails to find a match if $my_name=$db_name="Acidovorax sp. JS42" for instance.
        Ouch! I guess I should learn to program before I start (mis)representing myself as someone who knows what he's talking about. I've updated the post with what I think might possibly work. Thanks for the heads up.