in reply to external sort performance improved?

This will do the same job, and probably substantially faster:

\windows\system32\sort /m 5242880 sort_input.txt /O sort_output.txt

Update: On the nearest somewhat equivalent file I had -- 6GB -- it took 14 minutes.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: external sort performance improved?
by rkshyam (Acolyte) on Apr 16, 2012 at 11:22 UTC

    i am currently hitting with the insufficient memory issue when I run this command on my system.I will debug this issue and let you know.By the way, can this command be used in script and run? or is it perl one liners? Also how is it different from external sort that I have used and now what you have mentioned? Please clarify

      i am currently hitting with the insufficient memory issue when I run this command on my system.

      With 12GB of memory and a 5GB file, this should not be happening.

      When you hit an error, if you post the error message you receive -- cut&paste rather than paraphrased -- you may get a quick solution to your problem.

      .By the way, can this command be used in script and run?

      What kind of script?

      or is it perl one liners?

      It is a bog standard windows command.

      It can be invoked: from the command line; from a batch script; from a perl script; or in any other way a system command can be invoked.

      Also how is it different from external sort that I have used and now what you have mentioned?

      The perl script you showed calls back into perl for every comparison; and (unnecessarially) re-splits two lines for every comparison.

      Assuming your example snippet lines are representative of the whole file; and assuming average number of N*log2(N) comparisons are required to sort your file, that means you are calling back into Perl 1.5 billion times and re-spliting lines 3 billion times.

      It is unsurprising that a dedicated sort utility that doesn't need to do either of those things will run more quickly.

      Please clarify

      You are sorting your data by the 1 field that appears at the beginning of each record, therefore there is no need to split the records in order to sort them correctly.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Thanks BrowserUK for your response: 1>I am getting the warning as "Warning: the specifed memory size is be +ing reduced to the available paging memory".The paging file size of t +his system is set to 12284 MB.(Even when I run this on w2k8 64 bit 12 +GB RAM system)(Let me know if I need to change the paging file size) 2>The output of sort which you have mentioned and output of my code di +ffers when the date and time is same.With the sort option you provide +d, the entire line is considered for sorting and hence the text lines + after the date and time is also getting sorted(which I dont want tex +t lines to get sorted when data and time is same).Is there a way to d +o it? I have attached the output of both sorts Your sort: 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.reflect.Method.i +nvoke(Method.java:597) 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.Thread.run(Threa +d.java:662) 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.GeneratedMetho +dAccessor1387.invoke(Unknown Source) 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.DelegatingMeth +odAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.server.UnicastServ +erRef.dispatch(UnicastServerRef.java:305) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.transport.Transpor +t$1.run(Transport.java:159) My sort: 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.GeneratedMetho +dAccessor1387.invoke(Unknown Source) 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.DelegatingMeth +odAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.reflect.Method.i +nvoke(Method.java:597) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.server.UnicastServ +erRef.dispatch(UnicastServerRef.java:305) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.transport.Transpor +t$1.run(Transport.java:159) 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.Thread.run(Threa +d.java:662)