in reply to Re^2: external sort performance improved?
in thread external sort performance improved?

i am currently hitting with the insufficient memory issue when I run this command on my system.

With 12GB of memory and a 5GB file, this should not be happening.

When you hit an error, if you post the error message you receive -- cut&paste rather than paraphrased -- you may get a quick solution to your problem.

.By the way, can this command be used in script and run?

What kind of script?

or is it perl one liners?

It is a bog standard windows command.

It can be invoked: from the command line; from a batch script; from a perl script; or in any other way a system command can be invoked.

Also how is it different from external sort that I have used and now what you have mentioned?

The perl script you showed calls back into perl for every comparison; and (unnecessarially) re-splits two lines for every comparison.

Assuming your example snippet lines are representative of the whole file; and assuming average number of N*log2(N) comparisons are required to sort your file, that means you are calling back into Perl 1.5 billion times and re-spliting lines 3 billion times.

It is unsurprising that a dedicated sort utility that doesn't need to do either of those things will run more quickly.

Please clarify

You are sorting your data by the 1 field that appears at the beginning of each record, therefore there is no need to split the records in order to sort them correctly.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

  • Comment on Re^3: external sort performance improved?

Replies are listed 'Best First'.
Re^4: external sort performance improved?
by rkshyam (Acolyte) on Apr 17, 2012 at 09:00 UTC
    Thanks BrowserUK for your response: 1>I am getting the warning as "Warning: the specifed memory size is be +ing reduced to the available paging memory".The paging file size of t +his system is set to 12284 MB.(Even when I run this on w2k8 64 bit 12 +GB RAM system)(Let me know if I need to change the paging file size) 2>The output of sort which you have mentioned and output of my code di +ffers when the date and time is same.With the sort option you provide +d, the entire line is considered for sorting and hence the text lines + after the date and time is also getting sorted(which I dont want tex +t lines to get sorted when data and time is same).Is there a way to d +o it? I have attached the output of both sorts Your sort: 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.reflect.Method.i +nvoke(Method.java:597) 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.Thread.run(Threa +d.java:662) 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.GeneratedMetho +dAccessor1387.invoke(Unknown Source) 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.DelegatingMeth +odAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.server.UnicastServ +erRef.dispatch(UnicastServerRef.java:305) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.transport.Transpor +t$1.run(Transport.java:159) My sort: 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at com. 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.GeneratedMetho +dAccessor1387.invoke(Unknown Source) 2012/12/13 @ 19:00:27,792 @ ,, at sun.reflect.DelegatingMeth +odAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.reflect.Method.i +nvoke(Method.java:597) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.server.UnicastServ +erRef.dispatch(UnicastServerRef.java:305) 2012/12/13 @ 19:00:27,792 @ ,, at sun.rmi.transport.Transpor +t$1.run(Transport.java:159) 2012/12/13 @ 19:00:27,792 @ ,, at java.lang.Thread.run(Threa +d.java:662)
      I am getting the warning as "Warning: the specified memory size is being reduced to the available paging memory".The paging file size of this system is set to 12284 MB.(Even when I run this on w2k8 64 bit 12GB RAM system

      That is just a warning, it doesn't prevent the sort from working. I'm not sure if it is a bug in the way the program determines the amount of memory available; or if the "paging memory" it talks of is some specialised subset of the available memory.

      Either way, when you get that warning, it means the program will use the maximum amount it thinks it can use.

      The output of sort which you have mentioned and output of my code differs ...

      That's unfortunate. sort.exe doesn't have a way to restrict the key length.

      The next fastest solution would be to download GNU CoreUtils and either put the entire package in your path, or just the sort.exe (and it dependancies:  libintl3.dll & libiconv2.dll somewhere in your path and use the command:

      sort -S 3G -k 1,26 dataf -o dataf.sorted

      (Note:This sort utility is a pre-compiled 32-bit binary, so 3 GB is the maximum it can handle)

      The sort will be substantially slower than with the windows supplied sort, but should be quicker than your perl script.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        I have installed coreutils and added path entry as mentioned.ran the command <1>sort -S 3G -k 1,26 sort_input.txt <2>sort -S 3G -k 1,26 sort_input.txt -o sort_output.txt but this is resulting in error "Input file specified two times"