The sort should be on the unpacked epoch value.

No. A big advantage is that your packed binary epock dates should sort perfectly well without being unpacked provided that you use an alphasort (eg. cmp) and not numeric (<=>). And they will sort faster. This is the basis of the Guttman-Rosman Transform (GRT) sort.

To convince you of this, look at the binary representation of the following "epochs". Remembering that I am running on a little-endian machine so the byte ordering is reversed, each (numerically) bigger number is represented by a alphanumerically larger string when packed.

Update: Tye's right, you need 'N' not 'V'

[0] Perl> print unpack 'H*', pack 'N', 0+"1e$_" for 0 .. 10;; 00000001 0000000a 00000064 000003e8 00002710 000186a0 000f4240 00989680 05f5e100 3b9aca00 ffffffff

So, using the default sort on packed integers works fine provided that you use the correct pack format to match your platform's endianness. The bonus is, that this is the fastest sort, and by appending the offsets, any equal epochs will be sorted into file order. Try it.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^3: Sort big text file - byte offset - 50% there (Added code) by BrowserUk
in thread Sort big text file - byte offset - 50% there by msalerno

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.