hansoffate has asked for the wisdom of the Perl Monks concerning the following question:

For the most part, my script below works to generate a csv file. However, sometimes it doesn't return a result, which causes my csv file to be incorrectly aligned. A sample input sequence which causes alignment issues is below.

Thanks for the help,
-Hans

use strict; use Bio::Tools::Run::RemoteBlast; use Getopt::Std; my $usage = "\nusage: <script_name> <input_file> <output_file>\n". "Creates a CSV file with query_name, hit_name, score, and exp +ect\n\n". "-b Y or N If Y, it will produce the blastoutput for + each sequence query; Defaults N\n\n"; our($opt_b); getopts('b:') or die $usage; if(!defined($opt_b)) { $opt_b = 'N'; } my $prog = 'blastp'; my $db = 'nr'; my $e_val = '1e-2'; my $v = 1; my $infile = shift or die $usage; my $outfile = shift or die $usage; open OUT, ">$outfile" or die $usage; print OUT "Query_Name,Description,Hit_Name,Score,Expect\n"; my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(-prog => $prog, -data => $db, -readmethod => 'Se +archIO', -expect => 100); my $v = 1; #my $str = Bio::SeqIO->new(-file=>'smallinput.fna', -format => 'fasta' + ); my $str = Bio::SeqIO->new(-file=> $infile, -format => 'fasta' ); $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Arabidopsis [ +ORGN]'; ###animals not plants while( my $input = $str->next_seq()) { my $r = $remoteBlast->submit_blast($input); print STDERR "waiting..." if ($v > 0); while(my @rids = $remoteBlast->each_rid) { foreach my $rid (@rids) { my $rc = $remoteBlast->retrieve_blast($rid); if(!ref($rc)) { if($rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ($v > 0); sleep 5; } else { my $result = $rc->next_result(); #### save the output if($opt_b eq 'Y') { my $filename = $result->query_name()."\.out"; $remoteBlast->save_output($filename); } $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), " retrieved\n"; while (my $hit = $result->next_hit) { next unless ($v > 0); #print OUT $result->query_name().",".$result->query_descript +ion().",".$hit->name().","; if(defined($hit)) { print OUT $result->query_name().",".$hit->name().","; } else { print OUT "\n"; } while (my $hsp = $hit->next_hsp) { #print OUT $hsp->score().",".$hsp->expect()."\n"; if(defined($hsp)) { print OUT $hsp->score().",".$hsp->expect().",".$result-> +query_description()."\n"; } else { print OUT "\n"; } } } } } } } close OUT;
inputfile
>gi|117625301|ref|YP_854502.1| hydrogenase 2 protein HybA [Escherichia + coli APEC O1] MNRRNFIKAASCGALLTGALPSVSHAAAENRPPIPGSLGMLYDSTLCVGCQACVTKCQDINFPERNPQGE QTWSNNDKLSPYTNNIIQVWTSGTGVNKDQEENGYAYIKKQCMHCVDPNCVSVCPVSALKKDPKTGIVHY DKDVCTGCRYCMVACPYNVPKYDYNNPFGALHKCELCNQKGVERLDKGGLPGCVEVCPAGAVIFGTREEL MAEAKKRLALKPGSEYHYPRQTLKSGDTYLHTVPKYYPHLYGEKEGGGTQVLVLTGVPYENLDLPKLDDL STGARSEHVQHTLYKGMMLPLAVLAGLTVLVRRNTKNDHHDGGDDHES

Replies are listed 'Best First'.
Re: Bio::Search::Hit::BlastHit unreturned results misformats csv output
by jrsimmon (Hermit) on Jun 30, 2009 at 17:50 UTC
    How sure are you that the methods you are having trouble with return undef and not '', 0, -1, etc when a problem is encountered?
      I'm not that sure, however, this is what the next_hit() module does according to cpan.<
      Title : next_hit Usage : while( $hit = $result->next_hit()) { ... } Function: Returns the next available Hit object, representing potenti +al matches between the query and various entities from the dat +abase. Returns : a Bio::Search::Hit::HitI object or undef if there are no mo +re. Args : none
      The same is true for next_hsp(). Therefore, I thought if I threw I check to see if they are defined, it would only print if they are. Is this incorrect?
        I'm not in the least familiar with the modules you're using, so I may not be much help...that said, I'd definitely confirm that the behavior of the module you're using. Something like
        open(DBG, ">$some_log_file") or die; while (my $hit = $result->next_hit) { next unless ($v > 0); #print OUT $result->query_name().",".$result->query_description().", +".$hit->name().","; if(defined($hit)) { my $debug_qry = $result->query_name(); my $debug_hit = $hit->name(); print DBG "$debug_qry\t$debug_hit\t$hit\n"; print OUT $result->query_name().",".$hit->name().","; } else { print OUT "\n"; } while (my $hsp = $hit->next_hsp) { #print OUT $hsp->score().",".$hsp->expect()."\n"; if(defined($hsp)) { my $debug_score = $hsp->score(); my $debug_expect = $hsp->expect(); my $debug_qry_desc = $result->query_description(); print DBG "$debug_score\t$debug_expect\t$debug_qry_desc\t$hsp\n" +; print OUT $hsp->score().",".$hsp->expect().",".$result->query_de +scription()."\n"; } else { print OUT "\n"; } } }
        You could use Data::Dumper too, but that's probably overkill for what you need at this point. The point is that you need to verify whether you're getting a bad return from the method or whether you're handling a proper return badly.