I've found that sorting large files (i.e. millions of lines of text, 100s of MB in size) in practice is very hard to do efficiently in Perl. It's possible to get solutions, e.g. via CPAN modules, that work for certains types of data or certain data set sizes. It is also easy to introduce memory/CPU/disk hogging behavior as well, and then spending lots of time tracking them down.

If you don't need portability to platforms that don't support using 'sort', I'd suggest following graff's advice about transforming the data and then using Unix's sort command. This kind of implementation is straightforward, quick, and generally robust.


In reply to Re: perl sort versus Unix sort by bluto
in thread perl sort versus Unix sort by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.