Hello everyone :D

For a school project, I'm trying to create an iterative blastn progam. The idea is that a fasta file is n-blasted against a database created from that same fasta file.

So in the first round if there are any hits found, those hit id's with their respective sequences are used in the next round of blast. This will go on until the roundlimit is reached or there are no more hits found. The thing is that there are several iterations of hits. Many of those hits will be repeated in more than one iteration.

The end result would be a hash containing all the uniq hits found in the blast rounds. Sequences would be retrieved by the hit id's and finally a fasta file of highly similar sequences would have been made.

So far I have a working example although I'm not sure if does exactly what I want it to. Especially the section "#get header and sequence..." I'm not too sure if I'm doing that right. I could use some feed back on this or the whole progam in general.

Thank u very much.

sub iterblast{ # settings my $myblast = "blastall -p blastn -m 8"; my $starterseq = "/share/home/tjuh/Thema12-2011/Project/testblast/ +matchedhits.fasta"; my $blastdb = "/share/home/tjuh/Thema12-2011/Project/databases/ +termite_hindgut/hindgut_db"; my $outputroot = "/share/home/tjuh/Thema12-2011/Project/testblast/ +test1"; my $evalue = 0.01; my $wordsize = 11; my $roundlimit = 4; # output files: my $seq_db_fasta = $outputroot . ".tmp"; my $seq_db_all = $outputroot . ".all"; my $logfile = $outputroot . ".log"; # remove existing files system("rm -rf $seq_db_fasta"); system("rm -rf $seq_db_all"); system("rm -rf $logfile"); open(LOG,">$logfile"); print LOG "Configurations:\n"; print LOG "Input file : $starterseq\n"; print LOG "Output files : $outputroot\-\n"; print LOG "Blast Engine : $myblast\n"; print LOG "Target database : $blastdb\n"; print LOG "E-value cut-off : $evalue\n"; print LOG "Blast word size : $wordsize\n"; print LOG "Blast round limit: $roundlimit\n\n"; print LOG "Runs:\n"; # hits hash my %hits = (); my %queryids = (); # Start the iterative blastn rounds for (my $round = 1; $round <= $roundlimit; $round++) { my $hitcount = 0; my $infile = ""; # the initial fasta file is only run once if ($round > 1) { $infile = $seq_db_fasta; } else { $infile = $starterseq; } my $saveblst = $outputroot . ".txt"; system("rm -rf $saveblst"); # print LOG info my $curtime = `date`; chomp($curtime); print LOG "$curtime - Blast seq: $infile\n"; print LOG "$curtime - Blast db : $blastdb\n"; # run blast system("$myblast -i $infile -d $blastdb -o $saveblst -e $evalu +e -W $wordsize"); open(PARSE, "<$saveblst"); open(FASTA, ">$seq_db_fasta"); # parse blast output (tab form) and add matched id's in hi +ts hash my @lines = (); while (my $line = <PARSE>){ chomp $line; @lines = split(/\t/, $line); $queryids{$lines[0]}++; $hits{$lines[1]}++; } #print "Hash size queryids: ", scalar keys %queryids, "\n" +; #print "Hash size hits: ", scalar keys %hits, "\n"; # get header and sequence with 'fastacmd' and write to new + fasta file while (my $key = each %hits){ #print $key . "\n"; if (exists($queryids{$key})){ # only writes uniq ids t +o file #print $key . "\n"; $queryids{$key} = 1; } else { system("fastacmd -d $blastdb -s 'lcl|$key' >> $seq +_db_fasta"); $hitcount++; } } print LOG "Round $round has $hitcount unique hits\n\n"; # evaluate if to run for next blast round if ($hitcount == 0) { print LOG "Hitcount: $hitcount through Round $round == +> STOP\n"; $round = $roundlimit; } close(FASTA); close(PARSE); } print "Total uniq hits found in all rounds: ", scalar keys %hits, +"\n"; close(LOG); }

In reply to Iterative BLAST(N) program by Tjuh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.