I started working through your code to eliminate the system calls and modify the file I/O to use string files when I realised that toolic had already identified a major bug with your code. See his second bullet point in Re: Iterative BLAST(N) program. toolic also makes the very strong point that you should write some test code or otherwise validate your results using a test set of data.

It may be that the following self contained sample code provides a test bed for you to determine what you need to know:

#!/usr/bin/perl use strict; use warnings; my @roundData = (<<DATA1, <<DATA2); 2004112448 2004112448 100.00 738 0 0 1 738 1 + 738 0.0 1421 2004112448 2004113832 90.44 586 54 2 153 738 1 +6 599 0.0 660 2004112448 2004101124 89.67 368 37 1 361 728 1 + 367 3e-104 379 2004112448 2004105206 83.33 606 101 0 76 681 4 + 609 1e-100 367 2004112448 2004086271 85.95 363 51 0 341 703 8 +3 445 5e-75 281 2004112448 2004120286 80.98 205 39 0 208 412 3 +73 577 1e-19 97.6 2004112448 2004133966 91.43 70 6 0 618 687 261 + 192 9e-18 91.7 2004112448 2004136134 87.84 74 9 0 193 266 345 + 418 5e-13 75.8 2004112448 2004112239 84.34 83 13 0 461 543 10 +1 183 8e-09 61.9 2004112448 2004112239 89.36 47 5 0 366 412 6 + 52 2e-06 54.0 2004112448 2004097170 85.14 74 11 0 193 266 34 +6 419 3e-08 60.0 2004112448 2004127588 86.15 65 9 0 193 257 346 + 410 1e-07 58.0 2004112448 2004119982 86.15 65 9 0 193 257 346 + 410 1e-07 58.0 2004112448 2004100435 86.15 65 9 0 193 257 55 + 119 1e-07 58.0 2004112448 2004135256 82.56 86 15 0 184 269 33 +7 422 8e-06 52.0 2004112448 2004121539 81.93 83 15 0 193 275 34 +6 428 5e-04 46.1 2004112448 2004092379 81.93 83 15 0 193 275 34 +6 428 5e-04 46.1 2004099281 2004099281 100.00 844 0 0 1 844 1 + 844 0.0 1661 2004099281 2004142218 90.13 446 43 1 400 844 1 + 446 1e-149 529 2004099281 2004092058 81.88 458 82 1 376 832 7 +6 533 1e-63 244 2004099281 2004147330 83.64 110 18 0 424 533 5 +41 650 6e-13 75.8 2004099281 2004147330 82.86 70 12 0 230 299 34 +7 416 0.002 44.1 2004099281 2004146484 86.05 86 12 0 685 770 57 +7 662 6e-13 75.8 2004099281 2004146484 85.88 85 12 0 424 508 31 +6 400 2e-12 73.8 2004099281 2004146484 83.10 71 12 0 229 299 12 +1 191 5e-04 46.1 2004099281 2004106112 81.68 131 23 1 694 823 6 +13 743 2e-09 63.9 2004099281 2004147824 85.71 70 10 0 439 508 49 +0 559 4e-08 60.0 2004099281 2004153303 88.64 44 5 0 439 482 619 + 662 1e-04 48.1 2004099281 2004082897 81.52 92 17 0 664 755 10 + 101 1e-04 48.1 2004099281 2004142389 89.47 38 4 0 795 832 169 + 206 0.002 44.1 DATA1 2004082897 2004082897 100.00 810 0 0 1 810 1 + 810 0.0 1582 2004082897 2004156684 87.74 261 32 0 3 263 153 + 413 1e-69 264 2004082897 2004149613 80.35 514 99 2 289 801 1 + 513 2e-53 210 2004082897 2004146484 87.91 91 11 0 8 98 554 + 644 2e-18 93.7 2004082897 2004126309 81.10 164 31 0 8 171 80 + 243 4e-14 79.8 2004082897 2004106112 85.11 94 14 0 8 101 581 + 674 6e-13 75.8 2004082897 2004127134 95.45 44 2 0 127 170 745 + 788 9e-12 71.9 2004082897 2004127134 91.18 34 3 0 223 256 844 + 877 0.002 44.1 2004082897 2004117857 79.46 185 38 0 621 805 4 +5 229 9e-12 71.9 2004082897 2004142219 87.69 65 8 0 25 89 904 + 968 6e-10 65.9 2004082897 2004142219 90.57 53 5 0 136 188 101 +5 1067 6e-10 65.9 2004082897 2004142389 100.00 29 0 0 571 599 60 +1 629 1e-07 58.0 2004082897 2004142389 96.43 28 1 0 143 170 170 + 197 1e-04 48.1 2004082897 2004092058 83.15 89 15 0 13 101 367 + 455 1e-07 58.0 2004082897 2004142218 87.27 55 7 0 10 64 265 + 319 2e-06 54.0 2004082897 2004117248 96.67 30 1 0 142 171 667 + 696 8e-06 52.0 2004082897 2004125016 96.55 29 1 0 571 599 571 + 599 3e-05 50.1 2004082897 2004125016 96.15 26 1 0 145 170 142 + 167 0.002 44.1 2004082897 2004099281 81.52 92 17 0 10 101 664 + 755 1e-04 48.1 2004082897 2004117249 93.10 29 2 0 571 599 244 + 272 0.008 42.1 2004083856 2004083856 100.00 774 0 0 1 774 1 + 774 0.0 1534 2004083856 2004128009 87.09 767 84 3 13 764 41 +2 1178 0.0 735 2004083856 2004100486 89.04 365 40 0 408 772 1 +23 487 1e-112 406 2004083856 2004112466 91.47 211 18 0 1 211 598 + 808 3e-73 276 2004083856 2004147543 84.54 194 30 0 346 539 8 +71 1064 2e-34 147 2004083856 2004131474 84.74 190 29 0 346 535 8 +86 1075 2e-34 147 2004083856 2004131474 87.80 41 5 0 61 101 613 + 653 0.008 42.1 2004083856 2004131474 82.61 69 12 0 138 206 69 +0 758 0.008 42.1 2004083856 2004097313 89.36 94 10 0 679 772 76 + 169 2e-22 107 2004083856 2004149567 90.91 44 4 0 73 116 235 + 278 5e-07 56.0 DATA2 iterblast(); sub iterblast { # settings my $myblast = "blastall -p blastn -m 8"; my $blastdb = 'foo'; my $evalue = 0.01; my $wordsize = 11; my $roundlimit = 4; print <<LOGMSG; Configurations: Output files : xxx\- Blast Engine : $myblast Target database : $blastdb E-value cut-off : $evalue Blast word size : $wordsize Blast round limit: $roundlimit Runs: LOGMSG # hits hash my %hits; my %queryids; my @fastaDbs; # Start the iterative blastn rounds for my $round (1 .. @roundData) { open my $parse, '<', \$roundData[$round - 1]; push @fastaDbs, ''; open my $fasta, '>', \$fastaDbs[-1]; # parse blast output (tab form) and add matched id's in hits h +ash while (defined (my $line = <$parse>)) { chomp $line; my @fields = split(/\t/, $line); ++$queryids{$fields[0]}; ++$hits{$fields[1]}; } # get header and sequence with 'fastacmd' and write to new fas +ta file my $hitcount; for my $hit (keys %hits) { next if exists $queryids{$hit}; # only write uniq ids to file print $fasta "fastacmd -d $blastdb -s 'lcl|$hit'\n"; $hitcount++; } close $fasta; close $parse; if ($hitcount) { print "Round $round has $hitcount unique hits\n\n"; next; } print "No hits in Round $round ==> STOP\n"; last; } print '-' x 20, "\n$_\n" for @fastaDbs; print "Total uniq hits found in all rounds: ", scalar keys %hits, +"\n"; }

The major interesting technique here (used in various ways in several places) is using a variable as a file. That avoids writing or reading any actual files at all, but allows most of the interesting processing code to remain intact.

True laziness is hard work

In reply to Re^5: Iterative BLAST(N) program by GrandFather
in thread Iterative BLAST(N) program by Tjuh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.