in reply to While loop printing only one fasta header file

Hi Monks, 1) I am reading a file fastaheaders.txt which contains something like +this >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO 2) In the second file I am trying to search and print the lines when f +ound . It looks like this >DFHGSUEIEEK ACGTCGTACGATCGATCAGTACGTACGAT // >JKDHUEIEEOE ACGATGCGTACAGTACAGTACAGTACAGT // >KDJIEEIOIEO AGTCGTCGTAGTGTTTTACCCCCATGTCA // >HSKWJSSWWOW AGTAGTAGTAGTAGGGGTTTTTTTTACCC // >ADJHFHIOHFO ACGTGGGGGGGGGGTTATTACCCCCCCCA // >DTEEIJEJEOJE TTTTTTTTTTGGGGGGGGGACCCCCCCAT 3) I am trying to print only those fasta file which are in my fastahe +aders.txt file, i.e., >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO. 4) Problem is that my program is doing that but like this:- >DFHGSUEIEEK >DFHGSUEIEEK >DFHGSUEIEEK >DFHGSUEIEEK ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT // >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT 6) Here is my program code... Can you please help me to whats going wr +ong ??? #!/usr/bin/perl open(FD, "<fastaheaders.txt"); @headers = <FD>; print "Enter the file name:"; $fn=<STDIN>; + open(FH,$fn) || die "Error opening the file:$fn"; while (<FH>) { foreach $head(@headers) { if (($head =~ /$_/) .. ($_ =~ /\/\//)) { print "$_"; } } } close FH; close FD; print "Loop Finished!";

Replies are listed 'Best First'.
Re: One more problem
by didess (Sexton) on Sep 11, 2008 at 08:00 UTC
    Hi!
    Don't you think that mixing implicit value ($_) and explicit one ($head) is unusefully confusing?
    I think that, when you use $_, it doesn't refer to the outmost loop :
    it has probably been reused by the inner one.
    I'd retry the loop part of your script the way down:
    Also, I suppressed the test about // in the if statement (I hope it's right)
    while ($Fh=<FH>) { chomp($Fh); foreach $head(@headers) { chomp($head); if ($head =~ /$Fh/) { print "$Fh\n"; } } }
    One remark (about efficiency) :
    I suppose you have (much?) more lines in the data file than in the keys file (fastheaders.txt) : you should invert the order of loops :
    foreach line (data file) foreach line (keys file) if found print AND break inner loop (last) endif
Re: One more problem
by juster (Friar) on Sep 11, 2008 at 08:33 UTC

    Try to only put <c> or <code> tags around the code itself and use <p> tags around your regular message. This makes your question more readable.

    The logic problems in your code are centered on the range (..) operator. You can use Super Search to search the Tutorials for the "range operator" to learn more about it.

    The range operator is a flip-flop, like a switch. It returns true when it's on, and false when it's off. Once the left hand side of the .. becomes true, the switch turns on. The "switch" will stay on (and keep returning true) unless it is triggered off. It is triggered off after the condition to the right of the .. becomes true. One thing to remember is it returns true that one time it is switching off (SEE TUTORIALS OK, hard to explain).

    If your program had only one loop you probably wouldnt notice this behavior. With a nested loop, not so good. After a matching header is found in the line, the inner loop checks every other header. The range operator ("switch"/flip-flop/..) is still returning true and every time you check another header, the last line read into $_ is printed.

    and here's my take:

    #!/usr/bin/perl use warnings; # YOU FORGOT THESE! use strict; open HEADF, '<fastaheaders.txt' or die "cant open fastaheaders.txt: $!"; my @headers = <HEADF>; close HEADF; # create a regexp to search for all headers at once # like this: (BLABLABLA|ASKDJFASD) will match either # of the two. my $regexp = '(' . join('|', @headers) . ')'; print 'Enter filename: '; my $fasta_fn = <STDIN>; chomp $fasta_fn; # remove newline, paranoia open FASTAF, '<', $fasta_fn or die "can't open $fasta_fn: $!"; while(<FASTAF>) { # $_ = the line of text from the file now... # m!! is like using // to match text, m just tells # perl to use a different surrounding character: ! # m!$regexp!o by itself matches against $_ automagically! # (the o says only read the value in $regexp once, # it won't change anyway) # print by itself prints $_ automagically! print if(m!$regexp!o .. m!//! and not m!//!); } close FASTAF;

    Untested! READ Tutorials or perlop for the m and .. operators and maybe perlrequick for regular expressions.

Re: One more problem
by llancet (Friar) on Sep 11, 2008 at 07:55 UTC
    I cannot understand this sentence:
    if (($head =~ /$_/) .. ($_ =~ /\/\//))
    What are you doing here?

    In fact, the most sensible way to do your job is to load the FASTA sequence file into a hash:
    my %fasta_source; my $tmp_title; open INSEQ,"<$file_fasta"; while (<INSEQ>) { chomp; if ($_=~/^>/) { $tmp_title=$_; } else { $fasta_source{$tmp_title}.=$_; } } close INSEQ;
    This should put the FASTA file into the hash "fasta_source", and you would easy to do what you want.