Not to be argumentative, but are you sure you need to sort this list?

If your goal were to identify duplicates in the list, you could take a hash (MD5, not %hash) of each list element, sort the hashes, and look for duplicates in the hash space. A hash function like MD5 or SHA1 should reliably distinguish 2 GB strings, but if you do find duplicates, you could always verify them from the primary data.

If these strings are expected to be dissimilar, it may suffice to sort them based on their first 1000 characters and then use a separate procedure to deal with cases where the first 1000 characters are identical.

This seems like a time to step back and think about the larger goal and any a priori knowledge of the data to be sorted.


In reply to Re: Sorting Gigabytes of Strings Without Storing Them by eye
in thread Sorting Gigabytes of Strings Without Storing Them by neversaint

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.