MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:

Dear All,

I am trying to think of a way of getting all possible cases of a word to be looked at.

In a file i have some data like:

>gi|1870687|gb|U77948.1|HSU77948 Human Bruton's tyrosine kinase-associ +ated protein-135 mRNA, >gi|22415887|emb|AL683818.12| Mouse DNA sequence from clone RP23-229B2 +3 on chromosome 4, complete

I am looking for the latin names also like Homo sapiens etc. What i want to do now is not to miss the hits that could be " human " instead of " Human " or " mouse " instead of " Mouse ", i.e i want to look for all combinations of the words existing with uppercase and lowercase letters. I have an incling that i could use \u and \U or \L and \l.

What i am stuck with now is how to use them in the instance of this if statement, where the word is stored within the $filter scalar.

if($current_subject =~ /\b\Q$filter\E\b/) # filter and get exact name

note: the $current_subject is the current line in the file

Thanks

Replies are listed 'Best First'.
Re: Checking for all cases of a word
by prasadbabu (Prior) on Jul 04, 2005 at 14:15 UTC

    If i understood your question correctly, here is my suggestion:

    You can make use of option modifier 'i' - ignore case.

    Also you go through perlre.

    if($current_subject =~ /\b\Q$filter\E\b/i)

    Prasad

Re: Checking for all cases of a word
by Nevtlathiel (Friar) on Jul 04, 2005 at 14:19 UTC
    There's no need to upper or lower-case everything, you can use i as a regex suffix for a case insensitive match. eg:

    if($current_subject =~ /\b\Q$filter\E\b/i)

    ----------
    My cow-orkers were talking in punctuation the other day. What disturbed me most was that I understood it.

      Cheers everybody.
      A simple solution, but when you dont know the code begins to grow in size to try and resolve it.
Re: Checking for all cases of a word
by Limbic~Region (Chancellor) on Jul 04, 2005 at 17:55 UTC
    MonkPaul,
    As you can see, the language offers easier ways to do this. In addition to using the /i modifier to the regex, other techniques involve making a copy in all 1 case using uc or lc.

    The important thing is to realize when an approach is too inefficient to be of any use and abandon it in search of a better way. Your approach is 2^N where N represents the number of characters. While it is certainly possible to come up with every case variation (see this for both p5 and p6 examples), 536,870,912 checks for the longest word seem counter-productive. It isn't always easy to program efficient Perl efficiently - but it can be a fun distraction.

    Cheers - L~R