If a result has more than one blast hits, these blast hits have the same SIZE, right?

In that case you could collect your blast hits in a hash with SIZE as the key. Now an existing hash entry should only be substituted with a blast hit of the same SIZE when the blast hit has an NPNO value. This assumes that there is only one blast hit with an NPNO in any result, but you seem to imply that

As values of the hash you could either store an array with the other values or one string with all the lines of a blast hit.

Naturally such a hash might fill up your memory when you have huge piles of data. In that case that hash should be stored on disk with something like DBM::Deep. Another possibility would be to write your extracted data as single lines to a second file with the SIZE at the beginning of the line. Then sort the file (on unix as easy as executing the utility 'sort'). Afterwards read the file and select from consecutive lines with the same SIZE the one with a NPNO in it.

Here some code to illustrate what I mean (in case you are not used to using hashes)

First how to store NPNO-hits:

$results{$length}= "\nQURY: $query\nSIZE: $length\nGINO: $gino\n ...";
And how to store non-NPNO hits:
if ( not exists $result{$length} ) { $results{$length}= "\nQURY: $query\nSIZE: $length\nGINO: $gino\n ... +"; }
After having read it all in, print out the hash:
foreach my $text (values %results) { print OUT $text }
EDIT: Didn't notice that all blast hits of a result are grouped together. My solution still works but the solution of pc88mxer is much better.

In reply to Re: Find the first occurance and if not found print the first line by jethro
in thread Find the first occurance and if not found print the first line by sm2004

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.