in reply to sorting very large text files
That's somewhat conservative, specially if the machine you are using is not very loaded. Increasing that buffer size will make the sorting much faster. For instancebuffer_size = min(1/8 * physical_RAM, free_memory)
Another way to increase the speed of the operation, is to reduce the size of the file using a more compact encoding. For instance, representing numbers in binary format instead of as ASCII strings will reduce its size to 1/5; DNA sequences can be reduced to 1/4; enumerations to 1 or 2 bytes, etc.sort -S 3.5G ...
This kind of compacting will introduce "\n" characters in the stream that need to be escaped. A simple way is to perform the following expansion:
my %expand = ( "\x10" => "\x11\x11", "\x11" => "\x11\x12"); s/([\x10\x11])/$expand{$1}/g;
|
---|