WonkoTheSane has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks I'm a first time poster so please bear with me :)

Problem: My bioperl searchIO object is empty, even though the blast output file is not.

I have searched high and low for the past couple of hours for an explanation to the output that I get from my perl script. I hope someone here can shed some light on it.

Details: I am reading a set of nucleotide sequences (solexa reads) from a file in FASTA format. The nucleotide sequences gets translated into protein sequences using all 6 reading frames. If any protein sequences contain stop codons these sequences are discarded. The remaining sequences are then passed on to a Bio::Tools::Run::StandAloneBlast blast object and the method blastall() is called. The method yields a Bio::SearchIO object. From the SearchIO object I should be able to retrieve all the results from my blast using the method next_result() and from that object I should be able to retrieve the blast hits using the method next_hit(). But instead I get this error message telling me that the variable storing the next_result() output is undefined.
Can't call method "next_hit" on an undefined value at read_blast_stats +.pl line 150, <GEN2> line 253 (#1) (F) You used the syntax of a method call, but the slot filled by t +he object reference or package name contains an undefined value. Som +ething like this will reproduce the error: $BADREF = undef; process $BADREF 1,2,3; $BADREF->process(1,2,3); Uncaught exception from user code: Can't call method "next_hit" on an undefined value at read_blast_s +tats.pl line 150, <GEN2> line 253. at read_blast_stats.pl line 150
I store the results from the blast search in a file blast.out. After running my script and therby also blasting. This file is not empty, but something is wrong in this file too because only the results from one of the blasted sequences are stored, not all of them as one would have expected.

System:
  • Linux dist. : Red Hat 3.4.6-8
  • Perl v5.8.5 built for x86_64-linux-thread-multi
  • BioPerl v. 1.5.2
  • blastall v. 2.2.18


  • Code:
    # Creates BioPerl input sequence stream. my $in_seq = Bio::SeqIO->new(-file => "<$input_file", -format => $se +q_format ); # Extracting sequences from stream. print("- reading and translating sequences...\n"); my $seq_counter = 0; my $unique_count = 0; my @translated_seqs; while (my $seq = $in_seq->next_seq()) { $seq_counter++; print("-\tsequences read: $seq_counter\r"); my $temp_seq; # Translates sequence and stores the translated sequences if they +do not # contain any stop codons. # Reading frames 1, 2 and 3. $temp_seq = $seq->translate(); $temp_seq->display_id("test_id-$unique_count"); $unique_count++; push(@translated_seqs, $temp_seq) unless ($temp_seq->seq() =~ /\*+ +/); $temp_seq = $seq->translate(-frame => 1); $temp_seq->display_id("test_id-$unique_count"); $unique_count++; push(@translated_seqs, $temp_seq) unless ($temp_seq->seq() =~ /\*+ +/); $temp_seq = $seq->translate(-frame => 2); $temp_seq->display_id("test_id-$unique_count"); $unique_count++; push(@translated_seqs, $temp_seq) unless ($temp_seq->seq() =~ /\*+ +/); # Creates the reverse complement sequence. my $rc_seq = $seq->revcom(); # Reading frames -1, -2, and -3 $temp_seq = $rc_seq->translate(); $temp_seq->display_id("test_id-$unique_count"); $unique_count++; push(@translated_seqs, $temp_seq) unless ($temp_seq->seq() =~ /\*+ +/); $temp_seq = $rc_seq->translate(-frame => 1); $temp_seq->display_id("test_id-$unique_count"); $unique_count++; push(@translated_seqs, $temp_seq) unless ($temp_seq->seq() =~ /\*+ +/); $temp_seq = $rc_seq->translate(-frame => 2); $temp_seq->display_id("test_id-$unique_count"); $unique_count++; push(@translated_seqs, $temp_seq) unless ($temp_seq->seq() =~ /\*+ +/); } # Blasting... my @params = (-program => 'blastp', -database => '/share/raid1/databas +e/ftp.ncbi.nih.gov/blast/db/20090602/nr', -expectation => '200000', - +Word => '2', -Matrix => 'PAM30', -Gapcost => '9.1', -FilterString => +'F', _READMETHOD => 'blasttable' , -m => '8', -outfile => 'blast.out' +); my $blaster = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $blaster->blastall(@translated_seqs); # Parsing results my $result = $blast_report->next_result(); my $hit = $result->next_hit();


    My thoughts:
  • Because my sequences are created the way that they are, the sequence objects do not contain any other information than the protein sequence. So I thought, maybe blastall is behaving funny because of that. But after giving all sequences unique IDs I still get the same result
  • Blastall will give me the expected output file (with multiple results) if I, after translating the sequences write them all to the to the file blast.in and then calls blastall from the script using that input. But it must be possible to do this without having to write to the disc using BioPerl.
  • I tried taking the appropriate output created without bioperl like described above and process it using Bio::SearchIO. This works fine. So I guess the problem must lie somewhere around StandAloneBlast. I did also try to exchange my array of sequence objects with just one sequence object, but it made no difference.


  • Workaround: The following code is my far-from-optimal workaround so far. I really don't want to make that blast.in file. If you don't have a solution but a better workaround, then I would love to hear it. I have a ton of data and almost unlimited memory so IO operations is bad mmkay ;)
    # This code is inserted instead of # the Blasting and parsing part of the original code. # Creating blast.in open(OUTPUT_FILE, "> blast.in") or die "Could not create the output fi +le \"blast.in\"\n"; # Writing blast.in sequence file in FASTA format foreach(@translated_seqs){ print OUTPUT_FILE (">".$_->id()."\n".$_->seq()."\n");} # Performing blast `blastall -i "blast_beta.in" -d /share/raid1/database/ftp.ncbi.nih.gov +/blast/db/20090602/nr -p blastp -e 200000 -W 2 -M PAM30 -G 9.1 -F F - +m 8 -o "blast.out"`; # Creating SearchIO my $blast_report = Bio::SearchIO->new(-format => "blasttable", -file +=> "blast.out"); # Testing SearchIO for data. my $result = $blast_report->next_result(); my $hit = $result->next_hit(); print("Number of hits: ".$hit->length()."\n");


    Even if you couldn't help me, I thank you for taking the time to read this. Best wishes :)

    Edit: Changed the filehandle from $temp_file to OUTPUT_FILE.

    Replies are listed 'Best First'.
    Re: BioPerl StandAloneBlast is returning unexpected undefined SearchIO object
    by BrowserUk (Patriarch) on Jun 25, 2009 at 16:22 UTC

      I may be misreading the code (little wonder), but it looks to me that even if you pass a reference to an array or Bio object, the module you're using writes that data to a temporary file in order to pass it to the executable anyway.

      From my (limited) perspective, it looks like it might be a whole lot easier to start the executable directly using a piped open and pass the data through yourself. At least you'd know what was going on.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I may be misreading the code (little wonder)
        Huh? is my code that terrible? Or is it because its hard to understand if one is not familiar with BioPerl? Because that I can understand :)

        But your making a good point, even though it still don't explain the error. But I think that it will definitely improve my workaround. Thanks!
          Huh? is my code that terrible?

          Not your code (actually I didn't look at it that closely:); I was referring to the code inside Bio::Tools::Run::StandAloneBlast which I find extremely opaque.


          Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
          "Science is about questioning the status quo. Questioning authority".
          In the absence of evidence, opinion is indistinguishable from prejudice.
    Re: BioPerl StandAloneBlast is returning unexpected undefined SearchIO object
    by John M. Dlugosz (Monsignor) on Jun 25, 2009 at 15:52 UTC
      It appears that the code
      $blast_report->next_result();
      is returning undef instead of a blessed object. Is that object (StandAloneBlast) your code? Maybe it has error reporting options, or is supposed to return undef sometimes...?
        StandAloneBlast is a module in BioPerl (www.bioperl.org) and therefore kind of a black box..
        With that said, it should only return undef if the blast search that it did returns empty handed. But from doing the blast search manually (in my workaround) it can be seen that the search is not empty.
          Than it indicates a bug in that module. (That could be a documentation bug if its just not matching what you did by hand)
    Re: BioPerl StandAloneBlast is returning unexpected undefined SearchIO object
    by Anonymous Monk on Jun 25, 2009 at 16:20 UTC
      If you tried running this code for a single sequence, translating it into the 6 frames and then tried blasting it what would come?? I did not understand this part "If any protein sequences contain stop codons these sequences are discarded" because stop codons are there in the Nucleotide Sequence and not the protein sequence, this can make me doubtful on what you are trying to achieve but let me try giving you some walkaround clues.... I will just suggest that you translate each nucleotide sequence into its 6 frames, and for each set of the forward and backward frame you make decisions on which sequences to include for translation.. i.e. remove the sequences that have stop codons in them before translating the remaining ones into proteins and blasting...I have noticed another thing, you are using blastp which accepts amino acids as input sequences. I hope I have been of use in these clues Thanks, best of luck
        If you tried running this code for a single sequence, translating it into the 6 frames and then tried blasting it what would come??
        Unfortunately I get the exact same error :(

        Ok I might have been a little unclear on this. I translate one nucleotide sequence into six amino acid sequences. A stop codon in the nucleotide sequence will be translated into an asterisk *. It is therefore easier to sort out the sequences after translating them.

        I'm sorry I don't understand your concern regarding my use of blastp. I am blasting protein sequences (amino acids).
          get the six different reading frames from the nucleotides, decide on which ones to remove because they got any one of the three stop codons, do that first, then translate the rest of the sequences into proteins and do the blasting. that would save you from using the * to mark where a stop codon is...I know I am not able to give you a solution but I am just suggesting something that could land you anywhere near it... I will really advice you to use the * only in Typeglob notations in Perl.. I had a feeling that you used blastp on a protein sequence cuz I did not read your code in a clear way. Be a bit more neat instead of repeating the same if statement over and over, your program could be a way shorter and effective too...
          Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind
    Re: BioPerl StandAloneBlast is returning unexpected undefined SearchIO object
    by arun_kom (Monk) on Jun 26, 2009 at 15:03 UTC
      I am not sure if i got it right but I believe the error is because Blastall expects a Bio::SeqIO object. To convert sequences to a Bio::SeqIO object which you can directly pass on to your blastall call, you may use IO::String (see bioperl documentation on SeqIO)

      The following worked for me. First define a string variable to store sequences in fasta format my $sequence_string = "";
      Then replace these lines in your code
      push(@translated_seqs, $temp_seq) unless($temp_seq->seq() =~ /\*+/);
      with
      $sequence_string = $sequence_string.">".$temp_seq->display_id."\n".$temp_seq->seq."\n" unless($temp_seq->seq =~ /\*+/);
      (As biohisham suggested Re^3: BioPerl StandAloneBlast is returning unexpected undefined SearchIO object, there may be better ways to select open reading frames.)

      Before calling Blast, convert the string to SeqIO object as follows:
      my $stringfh = new IO::String($sequence_string); my $seqio = new Bio::SeqIO(-fh => $stringfh, -format => 'fasta'); my $blaster = Bio::Tools::Run::StandAloneBlast->new(@params);
      Loop through sequences to blast and continue with whatever you want to do with your blast report
      while( my $seq = $seqio->next_seq ) { my $blast_report = $blaster->blastall($seq); ... while( my $BLAST_result = $blast_report->next_result ) { ... while( my $hit = $BLAST_result->next_hit ) { ... } } }
      The parser worked fine with the blast report in default format but failed to read the ‘blasttable’ format. I haven't understand the reason for this.
        You have to tell StandAloneBlast it will be returning blasttable format, the default is 'blast' format and the parser doesn't try to guess.

        I'm really confused why you are doing the translation yourself and not running BLASTX, but anyways the code for doing translation is described below?

        The StandAloneBlast module is pretty fragile unfortunately - I really prefer to just code the cmdline parameters myself. But I think the problem from before is that it assumes you are passing in a Bio::Seq object (or Bio::PrimarySeqI interface object).

        Here's a part of a script to generate your query sequences in FASTA format and then run your blast with a system call or better yet do the parsing on the fly... The translate function has a lot of capabilities as well like finding ORFs for you.

        use Bio::SeqIO; use Bio::SearchIO; my $input = Bio::SeqIO->new(-format => 'fasta', -file => "inputseqs.fa +"); my $output = Bio::SeqIO->new(-format => 'fasta', -file => ">translated +.fa"); while(my $seq = $input->next_seq ){ my $id = $seq->id; # fwd strand for my $frame ( 0..3 ) { my $translated = $seq->translate($frame); $translated->id($id. "_F$frame"); $output->write_seq($translated); } # rev strand my $revcom = $seq->revcom; for my $frame ( 0..3 ) { my $translated = $revcom->translate($frame); $translated->id($id. "_R$frame"); $output->write_seq($translated); } $output->close(); } my $blastexe = "blastall -p blastp -d swissprot -e 1e-3 -i translated. +fa -m 9"; open(my $fh => "$blastexe |" ) || die $!; my $blastparser = Bio::SearchIO->new(-format => 'blasttable', -fh => $ +fh); # now do something with this as a SearchIO object # if you wanted the full alignment details, change blasttable to 'blas +t' and remove the '-m 9' in the exe string. # OR `$blastexe -o result.BLASTP.tab`;