in reply to Extracting BLAST hits from a list of sequences

Given your datasets, perhaps the following will be helpful (it correctly returns the one BLAST entry which contains a matching fasta header):

use strict; use warnings; my %headers; while (<>) { $headers{$1} = 1 if /^>(.+)/; last if eof; } local $/ = 'Query= '; while (<>) { chomp; print $/. $_ if /(.+)/ and defined $headers{$1}; }

Usage: perl script.pl headerFile blastFile [>outFile]

The last, optional parameter directs output to a file.

The local $/ = 'Query= '; sets file reading to one BLAST entry ('record') at a time. The header info after Query= is captured and the complete entry is printed if that header is in the hash.

Replies are listed 'Best First'.
Re^2: Extracting BLAST hits from a list of sequences
by no_slogan (Deacon) on Jan 20, 2014 at 16:44 UTC
    Using $/ is a good suggestion (and I upvoted it), but I feel that relying on the magic of the <> operator is unnecessarily clever in this case.

      Appreciate the upvote. It seemed only 'natural' to use <> in this case, since the OP already had files in @ARGV and using <> to process them avoided adding the code to do so.