you can't even load the @lines array of a 400MB file in 1GB of RAM

I believe there is this thing called "virtual memory"... Note that in my original reply I also noted:

And I'd check how much paging space is configured. It sounds like that could be increased.

Virtual memory can be very useful. Especially if you sort something that can fit in physical memory and so only run through paging the larger things around a few times rather than having sort have to page through things something like O(log(N)) times.

Hence my creation of a quite small array that is easy to sort. Actually, this makes me think of a minor change that might have a non-trivial impact on the size of the array that I sort:

my @size= map 0 + ( split /\t/, $_ )[2], @in; # ^^^ store numbers not strings my @idx= sort { $size[$a] <=> $size[$b] }, 0..$#in; @in= @in[@idx];

You could also play games to reduce the number of times you run through @in (each time causing paging), but I think the gains from that would be relatively minor in comparison while the complexity required is relatively great. Except, perhaps, for the last iteration:

for( @idx ) { print $in[$_], $/; }

(Though, I half expect Perl to do @in= @in[@idx]; in-place.)

- tye        


In reply to Re^8: Sorting a (very) large file (memory) by tye
in thread Sorting a (very) large file by condar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.