http://qs1969.pair.com?node_id=11140269


in reply to Re^2: Get unique fields from file
in thread Get unique fields from file

> Depending upon the data of course, your HoH (hash of hash) structure could consume quite a bit more memory than the actual file size in MB.

This shouldn't be a problem if you a apply a sliding window technique° plus splitting the hashes into easily swappable chunks².

The trick is to balance time, space and disk access, by minimizing the the number of swaps.

This will scale well, until the limit given by disk-space.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

°) see

²) see