I hate to ask this, but are you sure it's the sort line that's the culprit, or could some other manipulation be causing the out-of-memory problems? I've done lots of sort-and-assigns just like you're doing, and even on very large arrays of hashrefs (circa 100k elements) the overhead from the sort is never more than a few hundred kilobytes.

Now to digress (or possibly not) there is one behavior peculiar to sorting arrays of references that I don't understand (and perhaps this -- or a variant -- is what's biting you)...

# for @foo with 100,000 elements, this sort eats 12k of memory @foo = sort { $foo->{bar} cmp $foo->{bar} } @foo; # but for the same foo, this sort eats 90M ! @foo = sort @foo; @foo = sort { $a cmp $b } @foo; # equivalent

As far as I can tell, this "bloat" happens when you try to sort any list of references with the default comparison operator. (I'm running 5.6.1 on linux.) It doesn't happen just because you compare two references inside a sort block...

# requires scads of memory @array_of_refs = sort { $a cmp $b } @array_of_refs; # doesn't @array_of_simple_scalars = sort { \$a cmp \$b } @array_of_simple_scalars;

I would think that the default sort on @array_of_refs would be doing a lexical comparison on the "stringified" ref. But apparently, that's not the case. Even a attempts to force "stringification" inside the sort block (but still refer to the ref) don't fix the problem...

# scads @array_of_stringrefs = sort { ('a: '.$a) cmp ('b: ".$b) } @array_of_stringrefs #scadless @array_of_stringrefs = sort { ('a: '.$$a) cmp ('b: ".$$b) } @array_of_stringrefs

Curiouser and curiouser. Can anyone shed any light on what might be going on here?

Kwin

In reply to Re: Sorting a large data set by khkramer
in thread Sorting a large data set by jlf

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.