In that case you could collect your blast hits in a hash with SIZE as the key. Now an existing hash entry should only be substituted with a blast hit of the same SIZE when the blast hit has an NPNO value. This assumes that there is only one blast hit with an NPNO in any result, but you seem to imply that
As values of the hash you could either store an array with the other values or one string with all the lines of a blast hit.
Naturally such a hash might fill up your memory when you have huge piles of data. In that case that hash should be stored on disk with something like DBM::Deep. Another possibility would be to write your extracted data as single lines to a second file with the SIZE at the beginning of the line. Then sort the file (on unix as easy as executing the utility 'sort'). Afterwards read the file and select from consecutive lines with the same SIZE the one with a NPNO in it.
Here some code to illustrate what I mean (in case you are not used to using hashes)
First how to store NPNO-hits:
And how to store non-NPNO hits:$results{$length}= "\nQURY: $query\nSIZE: $length\nGINO: $gino\n ...";
After having read it all in, print out the hash:if ( not exists $result{$length} ) { $results{$length}= "\nQURY: $query\nSIZE: $length\nGINO: $gino\n ... +"; }
EDIT: Didn't notice that all blast hits of a result are grouped together. My solution still works but the solution of pc88mxer is much better.foreach my $text (values %results) { print OUT $text }
In reply to Re: Find the first occurance and if not found print the first line
by jethro
in thread Find the first occurance and if not found print the first line
by sm2004
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |