comment on

You probably should be using a database, but if you are dead set against that for some reason...
My first thought is how have you determined that this record matches another? By looking at other records? If so, why break the "clumping" into a seperate pass?

Tailoring a better solution will depend on what your data is. What changes, what doesn't etc. Should all output records contain all fields that every other matching record contains?

If you on nix or are using cygwin, maybe you should make match the first key in the file and then pipe it into GNU sort (Good at handling large files and pretty fast.) Then all you reads should be sequential and you'll only have to hold the current match data in memory. Hope this helps.

-Lee

"To be civilized is to deny one's nature."

In reply to Re: Merge Purge by shotgunefx
in thread Merge Purge by krazken

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.