ashnator has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have got a simple problem and want help from u guys. 1) I am reading a file fastaheaders.txt which contains something like +this >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO 2) In the second file I am trying to search and print the lines when f +ound . It looks like this >DFHGSUEIEEK ACGTCGTACGATCGATCAGTACGTACGAT // >JKDHUEIEEOE ACGATGCGTACAGTACAGTACAGTACAGT // >KDJIEEIOIEO AGTCGTCGTAGTGTTTTACCCCCATGTCA // >HSKWJSSWWOW AGTAGTAGTAGTAGGGGTTTTTTTTACCC // >ADJHFHIOHFO ACGTGGGGGGGGGGTTATTACCCCCCCCA // >DTEEIJEJEOJE TTTTTTTTTTGGGGGGGGGACCCCCCCAT 3) I am trying to print only those fasta file which are in my fastahe +aders.txt file, i.e., >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO. 4) Problem is that my program is doing that but like this:- >DFHGSUEIEEK >DFHGSUEIEEK >DFHGSUEIEEK >DFHGSUEIEEK ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT // >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT 6) Here is my program code... Can you please help me to whats going wr +ong ??? #!/usr/bin/perl open(FD, "<fastaheaders.txt"); @headers = <FD>; print "Enter the file name:"; $fn=<STDIN>; + open(FH,$fn) || die "Error opening the file:$fn"; while (<FH>) { foreach $head(@headers) { if (($head =~ /$_/) .. ($_ =~ /\/\//)) { print "$_"; } } } close FH; close FD; print "Loop Finished!";

Replies are listed 'Best First'.
Re: While loop printing only one fasta header file
by GrandFather (Saint) on Sep 11, 2008 at 04:28 UTC

    There is so much broken in that small code sample that I'd simply throw it away and start again.

    For a start, if you need to check of existence of something you ought to be using a hash so put your header lines into a hash.

    Next, never reread a file if you don't have to. The intent (although it fails in practice) of your while loop is to reread the input file for each header line you are looking for - way bad way to go! Instead make the while loop the outer loop so you read the large data file once only, then look up the hash in the body of the loop to see if you have a line you want to process.

    You don't need to interpolate a variable into a string for it to be printed and if it's the default variable you don't need to specify it at all for print.

    Ponder this somewhat cleaned up version:

    use strict; use warnings; my $headerFile = <<DATA; >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO DATA my $dataFile = <<DATA; >DFHGSUEIEEK ACGTCGTACGATCGATCAGTACGTACGAT >JKDHUEIEEOE ACGATGCGTACAGTACAGTACAGTACAGT >KDJIEEIOIEO AGTCGTCGTAGTGTTTTACCCCCATGTCA >HSKWJSSWWOW AGTAGTAGTAGTAGGGGTTTTTTTTACCC >ADJHFHIOHFO ACGTGGGGGGGGGGTTATTACCCCCCCCA >DTEEIJEJEOJE TTTTTTTTTTGGGGGGGGGACCCCCCCAT DATA open my $FD, '<', \$headerFile; my %headers = map {$_ => 0} <$FD>; close $FD; open my $FH, '<', \$dataFile; exists $headers{$_} and print while <$FH>; close $FH;

    Prints:

    >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO

    Perl reduces RSI - it saves typing
Re: While loop printing only one fasta header file
by NetWallah (Canon) on Sep 11, 2008 at 04:16 UTC
    It appears that you have your loops backwards (and are missing file CLOSE statements).

    I think your intent will be fulfiled if you move the "for @headers" loop to inside the "While FH".

         Have you been high today? I see the nuns are gay! My brother yelled to me...I love you inside Ed - Benny Lava, by Buffalax

Re: While loop printing only one fasta file
by jwkrahn (Abbot) on Sep 11, 2008 at 04:41 UTC

    The first header in $line reads through the whole file and then for the second and subsequent headers the filehandle FH will return eof.   You need to either open $fn inside the foreach loop or use seek to return the filehandle to the beginning of the file inside the foreach loop.

One more problem
by ashnator (Sexton) on Sep 11, 2008 at 07:07 UTC
    Hi Monks, 1) I am reading a file fastaheaders.txt which contains something like +this >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO 2) In the second file I am trying to search and print the lines when f +ound . It looks like this >DFHGSUEIEEK ACGTCGTACGATCGATCAGTACGTACGAT // >JKDHUEIEEOE ACGATGCGTACAGTACAGTACAGTACAGT // >KDJIEEIOIEO AGTCGTCGTAGTGTTTTACCCCCATGTCA // >HSKWJSSWWOW AGTAGTAGTAGTAGGGGTTTTTTTTACCC // >ADJHFHIOHFO ACGTGGGGGGGGGGTTATTACCCCCCCCA // >DTEEIJEJEOJE TTTTTTTTTTGGGGGGGGGACCCCCCCAT 3) I am trying to print only those fasta file which are in my fastahe +aders.txt file, i.e., >DFHGSUEIEEK >JKDHUEIEEOE >KDJIEEIOIEO. 4) Problem is that my program is doing that but like this:- >DFHGSUEIEEK >DFHGSUEIEEK >DFHGSUEIEEK >DFHGSUEIEEK ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT ACGTCGTACGATCGATCAGTACGTACGAT // >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE >JKDHUEIEEOE ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT ACGATGCGTACAGTACAGTACAGTACAGT 6) Here is my program code... Can you please help me to whats going wr +ong ??? #!/usr/bin/perl open(FD, "<fastaheaders.txt"); @headers = <FD>; print "Enter the file name:"; $fn=<STDIN>; + open(FH,$fn) || die "Error opening the file:$fn"; while (<FH>) { foreach $head(@headers) { if (($head =~ /$_/) .. ($_ =~ /\/\//)) { print "$_"; } } } close FH; close FD; print "Loop Finished!";
      Hi!
      Don't you think that mixing implicit value ($_) and explicit one ($head) is unusefully confusing?
      I think that, when you use $_, it doesn't refer to the outmost loop :
      it has probably been reused by the inner one.
      I'd retry the loop part of your script the way down:
      Also, I suppressed the test about // in the if statement (I hope it's right)
      while ($Fh=<FH>) { chomp($Fh); foreach $head(@headers) { chomp($head); if ($head =~ /$Fh/) { print "$Fh\n"; } } }
      One remark (about efficiency) :
      I suppose you have (much?) more lines in the data file than in the keys file (fastheaders.txt) : you should invert the order of loops :
      foreach line (data file) foreach line (keys file) if found print AND break inner loop (last) endif

      Try to only put <c> or <code> tags around the code itself and use <p> tags around your regular message. This makes your question more readable.

      The logic problems in your code are centered on the range (..) operator. You can use Super Search to search the Tutorials for the "range operator" to learn more about it.

      The range operator is a flip-flop, like a switch. It returns true when it's on, and false when it's off. Once the left hand side of the .. becomes true, the switch turns on. The "switch" will stay on (and keep returning true) unless it is triggered off. It is triggered off after the condition to the right of the .. becomes true. One thing to remember is it returns true that one time it is switching off (SEE TUTORIALS OK, hard to explain).

      If your program had only one loop you probably wouldnt notice this behavior. With a nested loop, not so good. After a matching header is found in the line, the inner loop checks every other header. The range operator ("switch"/flip-flop/..) is still returning true and every time you check another header, the last line read into $_ is printed.

      and here's my take:

      #!/usr/bin/perl use warnings; # YOU FORGOT THESE! use strict; open HEADF, '<fastaheaders.txt' or die "cant open fastaheaders.txt: $!"; my @headers = <HEADF>; close HEADF; # create a regexp to search for all headers at once # like this: (BLABLABLA|ASKDJFASD) will match either # of the two. my $regexp = '(' . join('|', @headers) . ')'; print 'Enter filename: '; my $fasta_fn = <STDIN>; chomp $fasta_fn; # remove newline, paranoia open FASTAF, '<', $fasta_fn or die "can't open $fasta_fn: $!"; while(<FASTAF>) { # $_ = the line of text from the file now... # m!! is like using // to match text, m just tells # perl to use a different surrounding character: ! # m!$regexp!o by itself matches against $_ automagically! # (the o says only read the value in $regexp once, # it won't change anyway) # print by itself prints $_ automagically! print if(m!$regexp!o .. m!//! and not m!//!); } close FASTAF;

      Untested! READ Tutorials or perlop for the m and .. operators and maybe perlrequick for regular expressions.

      I cannot understand this sentence:
      if (($head =~ /$_/) .. ($_ =~ /\/\//))
      What are you doing here?

      In fact, the most sensible way to do your job is to load the FASTA sequence file into a hash:
      my %fasta_source; my $tmp_title; open INSEQ,"<$file_fasta"; while (<INSEQ>) { chomp; if ($_=~/^>/) { $tmp_title=$_; } else { $fasta_source{$tmp_title}.=$_; } } close INSEQ;
      This should put the FASTA file into the hash "fasta_source", and you would easy to do what you want.