I need some help with a regular expression I have a file that has the following strings that I need to match (Query strings)
file1: GCGAT, CACGT
The target strings are in file2, against which the query strings need to be matched GNGATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GCGANBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB CNCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
The condition for match is that: 1. Each of the query strings should be matched only in the beginning of the string 2. The query strings can have an N at any position which means for each query string eg. GCGAT we can have NCGAT,GNGAT,GCNAT,GCGNT,GCGAN. So any of these strings should be matched with the target strings. How do I make a regular expression that can contain all the 6 possibilities(includes the original string, GCGAT). I have the following code so far:
Thanks in advance, biobee# reading in each query string (file1) into an array while($line1= <INP1>){ chomp($line1); push (@barcode,$line1); } foreach $code(@barcode){ my $filename = $code; open(OUT, ">$filename") || die "$!\n"; for my $data(keys %idhash){ # I have stored each of the query stri +ngs in a hash. The value of the hash contains the target strings. The + keys are 1,2,3... my $value = $idhash{$data}; chomp($code); if($value =~ m/^$code/) # HOW DO I WRITE A REGULAR EXPRESSION HERE TO +ALLOW ALL THE 6 COMBINATIONS TO BE MATCHED PER QUERY STRING. { print "$idhash{data}\n"; # where the query string matches the +target string print value of the hash } } }
In reply to string match using with an N in any position by biobee07
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |