I should have noticed the quotes myself first time. :(

Since the command-line solution has the side-effect of translating the line-endings, I went with the scripted version from another reply. But I eventually returned to this command-line version to see how it works.

Good news: your $_ pre-allocation trick works, mostly.

Bad news: I had to guess at the right amount.. too low, and it still crawls at some point when it reads the huge record.. too high, and it crawls at the beginning trying to pre-allocate $_. It worked without crawling only for values between 270*1024*1024 to 300*1024*1024. Which is pretty limiting for a general solution. The binmode-script (in this thread) is the best general solution.

Still, if line-ending translation is ok, and I'm NOT running into single records/lines that are hundred of MB in size, this is a reasonably fast and easy way to sort the records. On the 900MB source file, the command-line version took 2m15s compared to the binmode-script at 1m45s.

Thanks very much for your help and insight.


In reply to Re^4: Thrashing on very large lines by chr1so
in thread Thrashing on very large lines by chr1so

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.