rkshyam has asked for the wisdom of the Perl Monks concerning the following question:
I have written sorting script using external sort and here is the code. Currently this is being used to sort large file size of nearly 5GB and time taken to complete the sorting is 32 minutes on windows 64 bit(64 bit active perl ) 12GB RAM.Wanted to know if the performance can be increased/improved or is this with good performance? Does my code needs any modification? I have also attached sample data to sort. Please help.
input lines: 2012/12/13 @ 13:32:35,585 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer 2012/12/13 @ 13:32:34,585 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer 2012/12/13 @ 12:32:35,485 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer 2012/12/13 @ 13:35:35,585 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer 2012/12/13 @ 14:32:35,585 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer 2012/12/15 @ 13:32:35,612 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer 2012/12/12 @ 11:32:35,585 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer 2012/10/13 @ 13:32:45,735 @ ,, INFO [EJB3Deployer] Starting java:comp + multiplexer $\ = "\n" ; $, = "\t" ; use strict; use warnings; use Sort::External; print "Script start time is\n ", scalar localtime(); open DATA, "sort_input.txt"; open OUTPUT, ">>sort_output.txt"; my $sortscheme = sub { my @flds_a = split(/,,/, $Sort::External::a); my @flds_b = split(/,,/, $Sort::External::b); $flds_a[0] cmp $flds_b[0]; }; #my $temp_directory = '/home/david/temp'; my $sortex = Sort::External->new( mem_threshold => 1024**2 * 16, sortsub => $sortscheme, #working_dir => $temp_directory, ); while (<DATA>) { chomp; $sortex->feed($_);} $sortex->finish; while ( defined( $_ = $sortex->fetch ) ) { print OUTPUT $_; } close DATA; close OUTPUT; print "Script end time is\n ", scalar localtime();
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: external sort performance improved?
by BrowserUk (Patriarch) on Apr 16, 2012 at 09:41 UTC | |
by rkshyam (Acolyte) on Apr 16, 2012 at 11:22 UTC | |
by BrowserUk (Patriarch) on Apr 16, 2012 at 11:44 UTC | |
by rkshyam (Acolyte) on Apr 17, 2012 at 09:00 UTC | |
by BrowserUk (Patriarch) on Apr 17, 2012 at 10:22 UTC | |
|