I have a regex whose behavior doesn't match my expectations.
The input data looks like this:
. transcript_id "g29202.t1"; gene_id "g29202"; gene_name "G42051"; xloc "XLOC_053322"; cmp_ref "G42051.1"; class_code "c"; tss_id "TSS54758";
. transcript_id "g29205.t1"; gene_id "g29205"; xloc "XLOC_053323"; class_code "u"; tss_id "TSS54760";
. transcript_id "g29176.t1"; gene_id "g29176"; xloc "XLOC_053324"; class_code "u"; tss_id "TSS54761";
. transcript_id "g29178.t1"; gene_id "g29178"; gene_name "G42030"; xloc "XLOC_053326"; cmp_ref "G42030.1"; class_code "o"; tss_id "TSS54763";
The code below works fine:use warnings; use strict; my $usage = "perl select_bracker.pl [bracker gtf] [output id list]\n"; my $gfin = shift or die $usage; my $output = shift or die $usage; open(IN, '<', $gfin); open(OUT, '>>', $output); while (my $record = <IN>){ $record =~ s/\R//g; if ($record =~ /^.*transcript_id "([^"]*).*class_code "([^"]*)/){ my $trans = $1; my $class = $2; if($class eq 's' | $class eq 'x' | $class eq 'u'){ print OUT "$trans\n"; } } } close IN; close OUT;
but if instead of if($class eq 's' | $class eq 'x' | $class eq 'u') I have if('sxu' =~ /$class/g) then the script works fine for the first line with a particular '$class' value it reads, but if it has two adjacent lines with the same '$class' value, the regex doesn't match and the print loop doesn't run for the second line (eg line 3 of the example input). I don't understand this at all, so any help would be much appreciated! Alastair
In reply to strange behavior of regex by biologistatsea
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |