in reply to Linux sort or FILE::SORT

The sort utility in the shell is a C program optimised for sorting, it will always be faster than loading Perl and calling a module, regardless of the efficiency of the module (assuming someone doesn't come up with a spectacularly fast new sort algorithm ;)

print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

Replies are listed 'Best First'.
Re^2: Linux sort or FILE::SORT
by BrowserUk (Patriarch) on Feb 08, 2011 at 11:04 UTC

    It is actually quite easy to beat gnusort for performance.

    Firstly it uses miniscule buffers relative to modern memory sizes necessitating far more write read cycles than could ever be optimum.

    Secondly, with most machines having multiple cores these days, it is silly not to utilise them.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      It does take the -S $BUFFER_SIZE flag.

      Are sort operations usually CPU bound?

      print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
        It does take the -S $BUFFER_SIZE flag.

        I guess you must have a later version of gnu sort than I?

        C:\test>sort --version sort (GNU textutils) 2.0 Written by Mike Haertel.
        Are sort operations usually CPU bound?

        The in-memory sort stages certainly are. If you overlap the sorting of one batch with the reading of the second, you save some time.

        Have one thread reading/or writing and 1/3/7/15/... threads sorting, you save more time.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.