Hello,

I have a regex whose behavior doesn't match my expectations.

The input data looks like this:

. transcript_id "g29202.t1"; gene_id "g29202"; gene_name "G42051"; xloc "XLOC_053322"; cmp_ref "G42051.1"; class_code "c"; tss_id "TSS54758";

. transcript_id "g29205.t1"; gene_id "g29205"; xloc "XLOC_053323"; class_code "u"; tss_id "TSS54760";

. transcript_id "g29176.t1"; gene_id "g29176"; xloc "XLOC_053324"; class_code "u"; tss_id "TSS54761";

. transcript_id "g29178.t1"; gene_id "g29178"; gene_name "G42030"; xloc "XLOC_053326"; cmp_ref "G42030.1"; class_code "o"; tss_id "TSS54763";

The code below works fine:
use warnings; use strict; my $usage = "perl select_bracker.pl [bracker gtf] [output id list]\n"; my $gfin = shift or die $usage; my $output = shift or die $usage; open(IN, '<', $gfin); open(OUT, '>>', $output); while (my $record = <IN>){ $record =~ s/\R//g; if ($record =~ /^.*transcript_id "([^"]*).*class_code "([^"]*)/){ my $trans = $1; my $class = $2; if($class eq 's' | $class eq 'x' | $class eq 'u'){ print OUT "$trans\n"; } } } close IN; close OUT;

but if instead of if($class eq 's' | $class eq 'x' | $class eq 'u') I have  if('sxu' =~ /$class/g) then the script works fine for the first line with a particular '$class' value it reads, but if it has two adjacent lines with the same '$class' value, the regex doesn't match and the print loop doesn't run for the second line (eg line 3 of the example input). I don't understand this at all, so any help would be much appreciated! Alastair


In reply to strange behavior of regex by biologistatsea

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.