Re: memory-efficient hash kind for incremental sort

I frankly suggest that you do it the old-fashioned way: use a disk-based sorting engine.

Sorting can very-easily remove all duplicates in the process of doing the sort... and there you have it, your intended result.

You see, the problem with “doing it in memory” is that, in the general case, “memory isn't.” In the general case, memory is a disk-drive. Even if you have gobs and gobs of RAM, every one of those page-faults cost money ... and your application also becomes a million-pound elephant that's going to be stepping-on everything nearby.

I'd start by perusing CPAN's Sort::External::Cookbook and specifically also the paper, A Fresh Look at Perl Sorting, by Uri Guttman and Larry Rosler, that is referred-to therein.

The advantage of this algorithm (especially when you ask the sort-engine to remove the duplicates for you as it goes), is that it is a generalized solution to the problem, and it is known to scale-up well. A good sorting engine (and CPAN, natcherly, is full of them...) will use internal sorting for small-sized lists and switch to external-sorting at some appropriate time. It usually also does some amount of filtering to remove fairly-adjacent duplicated keys from the input. All of this being done for you, and done opaquely.

What this does is to completely eliminate “searching.” You get a unique copy of all the keys that are present (and identical keys, if you accept them, are always adjacent). You can instantly detect any gaps in the sequence. And it's much faster than you think.