And I'd look into finding the sort order at the moment you're creating the huge hash.

The big hash is effectively uniq'ing millions from hundreds or thousands of millions of inputs. Those inputs are "words" extracted from "lines" read from an input file. The sort is to order those "words". There's no way to determine the ordering without sorting.

But I would go a different way: write all the keys to a file, one key per line. Call sort(1). Read the sorted file. Assign $. to each value.

Can you extract 10 million keys from a hash (without blowing memory); write them to a file; start an external process (without forcing the current process to be swapped out), that will read them from that file and write them to other files several times; before reading them back into memory from the file; all in 108 seconds?

If so, you might be onto something.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

In reply to Re^2: In-place sort with order assignment by BrowserUk
in thread In-place sort with order assignment by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.