ablakely has asked for the wisdom of the Perl Monks concerning the following question:

I have some realy simple code that just doesn't seem to work when I try using a variable regex. Literal regex works fine.

Here's my code:

#!/usr/bin/perl -w my $seq; open (SEQ, "<$ARGV[0]"); open (REP, "<$ARGV[1]"); open (OUT, ">$ARGV[2]"); while (<SEQ>){ $seq = $_; while (<REP>){ print OUT if /$seq/; } } close (SEQ); close (REP); close (OUT);
Right now I am just getting an empty file for an output. If I replace /$seq/ with /AAA/ or any other literal, it works just fine.

I'm sure its something easy, but I've been googling all afternoon to no avail.

Here are examples of the contents of the SEQ and REP files:

SEQ: AELIVQPELK REP: 1 116.68 116.68 48.2199996709824 26.8999993801117 21.17 +99994111061 ENST00000379802_141 [201 - 8954] cdna:known chromos +ome:GRCh37:6:7541808:7586950:1 gene:ENSG00000096696 gene_biotype:prot +ein_coding transcript_biotype:protein_coding 2 99.00000 +09536743 AELIVQPELK -5.64623987884261E-05 1138.65991 +210938 570.3372 1138.65979003906 570.337158203125 2 14 + 2.1.1.3480.1 1 0.7224 -1 -1
Thanks for your help!

Update: Got what I wanted with the following code. Thanks for the input everyone, it was helpful. I was unaware of the mechanics of a while loop using an open file as a condition. I'm sure its still a little messy but I'm a biologist not a programmer, so what can you expect?

#!/usr/bin/perl -w use strict; my (@seq, @rep, $i, $n, $l, $t); open (SEQ, "<$ARGV[0]"); open (REP, "<$ARGV[1]"); open (OUT, ">$ARGV[2]"); $i=0; $n=0; $l=0; while (<SEQ>){ chomp(); $seq[$i] = $_; $i++; } while (<REP>){ $rep[$n] = $_; $n++; } while ($l < $i){ $t=0; while ($t < $n){ print OUT $rep[$t] if $rep[$t] =~ /$seq[$l]/; $t++; } $l++; } close (SEQ); close (REP); close (OUT);

Replies are listed 'Best First'.
Re: Simple filter from rows of input file
by toolic (Bishop) on Aug 13, 2014 at 20:25 UTC
Re: Simple filter from rows of input file
by Laurent_R (Canon) on Aug 13, 2014 at 21:51 UTC
    In addition to what you've been told about chomp and quotemeta, this code:
    open (SEQ, "<$ARGV[0]"); open (REP, "<$ARGV[1]"); open (OUT, ">$ARGV[2]"); while (<SEQ>){ $seq = $_; while (<REP>){ print OUT if /$seq/; } }
    is not going to do what you want. Once you've read REP the first time for the first value found in SEQ, your while <REP> loop is never going to loop again on the file and will give you an empty (or actually undef) value. You've reached end of file, you need either to close the file and re-open it, or to use the seek function to get back at the beginning. So, in brief, this code is completely broken.

    Having said that, I was only pointing to this error in your code, to make you aware of the problem, I am really not suggesting that you should read the whole second file each time through your process. This would be immensely wasteful.

    If I understand correctly what you are trying to do, you should probably just read once the first file, chomp its content and store it in a hash. Then you can read the other file and do what you want, depending on whether the hash element exists or not. But that works only if you are looking for exact matches (in which case the quotemeta thing might not be needed). Well, in brief, you should give us some more details on the contents of the two input files.

Re: Simple filter from rows of input file
by 2teez (Vicar) on Aug 13, 2014 at 20:32 UTC

    Hi ablakely,

    From your code written, I can figure what you are doing is read from a file, then match what you read from the first file in the second file. Then write it out to the third file.
    If that is correct, then do this, read the first file into a hash variable, then read the second file one line at a time, to match what you have in the hash, then simply print out to the third file. QED!
    You can simply, write a subroutine that open and read a file and you use that to read the first two files!
    Also don't forget to use strict; in your script! And check if your opened file failed!

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Simple filter from rows of input file
by Anonymous Monk on Aug 13, 2014 at 23:39 UTC
    add use autodie; because open can fail :)
Re: Simple filter from rows of input file
by Anonymous Monk on Aug 13, 2014 at 20:25 UTC

    Perhaps you just need to chomp($seq)?

    Also, if you don't want special characters in the SEQ file to be interpreted as part of the regex, you should write /\Q$seq\E/ (see quotemeta).