And I'd look into finding the sort order at the moment you're creating the huge hash.
The big hash is effectively uniq'ing millions from hundreds or thousands of millions of inputs. Those inputs are "words" extracted from "lines" read from an input file. The sort is to order those "words". There's no way to determine the ordering without sorting.
But I would go a different way: write all the keys to a file, one key per line. Call sort(1). Read the sorted file. Assign $. to each value.
Can you extract 10 million keys from a hash (without blowing memory); write them to a file; start an external process (without forcing the current process to be swapped out), that will read them from that file and write them to other files several times; before reading them back into memory from the file; all in 108 seconds?
If so, you might be onto something.
In reply to Re^2: In-place sort with order assignment
by BrowserUk
in thread In-place sort with order assignment
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |