This made me laugh out loud. The main difference without pre-extending is a (likely) hole in the heap of size 2**(n-1) left just after the array size was doubled to make it big enough to hold the entire file contents and the slop on the end of the array between the number of lines and 2**n (the final size of the array). In other words, two "dead" spans of virtual memory that will (at least mostly) remain untouched during the sorting and so will cause no page faulting and won't slow down the sorting at all (and will only moderately slow down the loading of the file contents into memory).

So pre-extending shows that the lines being sorted can fit within 800MB of memory and so can be directly sorted without much if any page faulting. It cracks me up that samtregar threw up his hands in defeat thinking there's no point in trying to sort what won't fit in memory when the only extra page faulting caused by the process's virtual size exceeding physical memory had already taken place as part of filling the array. The rest of the sorting would only require pages totaling about 800MB.

I also found it amusing that Sort::Key is doing pretty much exactly what I did except replacing my 3 lines of Perl code with a big pile of complex XS code.

And then there is the problem of pre-extending requiring the file to be read twice which someone couldn't discount the cost of unless they are overlooking how much slower disk is than memory.

Of course, condar may need to allocate a bit more paging space. I can't tell how much of his running out of virtual memory space was due to memory wasted by the ST or just due to having way too little paging space configured.

- tye        


In reply to Re^8: Sorting a (very) large file (LOL) by tye
in thread Sorting a (very) large file by condar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.