MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monkies,

I have a question - Can i use a second regex when looking through a line in a file.

For instance i am currently looking for Homo sapien in one line of a file, but i also want to look for chromosome 11 say. At the moment, i am saving the filtered data from Homo sapiens search into an array, then conducting the search on this to find the chromosome 11.

for (my $i = 0 ; $i<scalar @line; $i++) { if ($line[$i] =~ /^>/) { $current_subject = $line[$i]; chomp ($current_subject); push (@filtered_subjects, $current_subject); } } my @element;

So stored in the array is:

>gi|14670349|ref|NM_032999.1| Homo sapiens general transcription facto +r II, i (GTF2I), transcript >gi|2827179|gb|AF035737.1|AF035737 Homo sapiens general transcription +factor 2-I (GTF2I) mRNA, complete >gi|19908489|gb|AF343351.1| Mus musculus TFII-I repeat domain-containi +ng protein 3 beta 7 mRNA,

Then i look through each element looking for the species, which comes from a HTML list.
foreach my $z (@filtered_subjects) { chomp $z; @element = split('\|', $z); if($element[4] =~ /\Q$Homosapiens\E/) { push(@chromoLine, $z); } } }
Is this the best way to do it, or can i use something like:
if($element[4] =~ /\Q$Homosapiens$choromosome\E/)

Any help is appreciated.

thanks,
MonkPaul.

Replies are listed 'Best First'.
Re: Double RegEx
by ikegami (Patriarch) on Jun 27, 2005 at 14:38 UTC

    If you know the order in which they'll appear in the string being matched,

    if($element[4] =~ /\Q$Homosapiens\E.*\Q$choromosome\E/)

    will work fine. If you don't know the order, you want

    if ($element[4] =~ /^ (?=.*?\Q$Homosapiens\E) (?=.*?\Q$choromosome\E) /x)

    It reads: "Match homo sapiens somewhere after the start of the line, then match chromosome somewhere after the start of the line."

    Update: I overlooked the obvious:

    if ($element[4] =~ /\Q$Homosapiens\E/ && $element[4] =~ /\Q$choromosome\E/)

    And since you're searching for constant strings, the following would be even faster:

    if (index($element[4], $Homosapiens) >= 0 && index($element[4], $choromosome) >= 0)
      Excellent
      Thas what i needed.
      I was not sure if people would get what i needed. I will just run this in my looping system and see if blast-off occurs.
      Will keep you posted.

      cheers.

      One other problem i have just found - At the moment its finding "chromosome 2*" where the * represents any other number. How can i complete this so it only looks for chromosome 2 only and say not 20.

      MonkPaul.

        /chromosome 2(?!\d)/
        "chromosome 2", not followed by a digit.
Re: Double RegEx
by rev_1318 (Chaplain) on Jun 27, 2005 at 14:38 UTC
    Where do you use the 'Double RegEx'? Currently, your entire code could be rewritten as:
    chomp @lines; foreach (@lines) { next unless /^>/; push @chromoLine, $_ if (split /\|/)[4] =~ /\Q$Homosapiens\E/; }

    Paul