comment on

In your example input, all the clusters occur contiguously, i.e., all Osat_a members (just the one), then all the Atha_b members, all Fves_d members, etc. Is this the case in your real data, or might you have data like, e.g.,

Osat_a    Osat_a # just one cluster member
Atha_b    Atha_b # >1 cluster member, this & next line = 2 members
Fves_d    Fves_d # this & next 2 lines = 3 cluster members
Osat_h    Osat_h
Atha_b    Mtru_c 
Fves_d    Osat_e
Atha_g    Atha_g # just 1 cluster member
Fves_d    Atha_f
Osat_h    Atha_i
...       ...
[download]

where cluster members are promiscuously mingled?

If the former case (all cluster members contiguous) is true, processing of very large files is easy: just buffer all cluster members until you detect the transition from one cluster member to another, then write out all buffered cluster members. This could scale to millions of cluster members.

In the latter case, something like LanX's suggestion seems the way to go.

Give a man a fish: <%-{-{-{-<

In reply to Re: Processing while reading in input by AnomalousMonk
in thread Processing while reading in input by onlyIDleft

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.