comment on

Hi I have performance problem with perl sort. The logic what I am currently using is creating a hash, getting the key value pair and then sorting based on ST sort technique for the fields date and time. The normal input filesize for sorting is around 3-4GB I have posted the code that is currently working but is taking around 3 hrs to complete the sort process on a 12GHz memory 64 bit windows system. (If I use the same script/technique on 32 bit windows system with 4GB RAM, it is resulting in out of memory error) The actual requirement is to sort this file and then split into no of files as 3 GB file could not be opened. The file splitting section is working appropriately. Please help if performance can be improved and out of memory issue could be resolved. Thanks in advance. Any help on this is greatly appreciated.

The sample input file content is:

2012/02/12 @ 14:29:26,519 @     ->     java.lang.NullPointerException


2012/02/12 @ 14:23:26,519 @  ->  |  WARN | RMI TCP Connection(184923)-
+170.80.0.9 | Error in getting the Network Adapter


2012/02/12 @ 14:20:26,522 @  ->  |  WARN | RMI TCP Connection(184923)-
+170.80.0.9 | Error in getting the  Network Adapter

and output should look like:

2012/02/12 @ 14:20:26,522 @  ->  |  WARN | RMI TCP Connection(184923)-
+170.80.0.9 | Error in getting the  Network Adapter

2012/02/12 @ 14:23:26,519 @  ->  |  WARN | RMI TCP Connection(184923)-
+170.80.0.9 | Error in getting the Network Adapter

2012/02/12 @ 14:29:26,519 @     ->     java.lang.NullPointerException
[download]

 

open FH_duplicate, "$file_duplicate" or die "$!";
    open FH1_sorting, ">>$file_consolidated_sort" or die "$!";

    my %hash = ();
    my $key;
    my $val;
    
    while(<FH_duplicate>)
    {
    chomp;
    ($key,$val)=split(/,,/);
    
    $hash{$key} .= $val;
    }
    close FH_duplicate;
    
    ### hash creation 

    ### sorting begins
    
    for $key(map{$_ -> [0]} 
    sort{ $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2]}     map{[$_,(spl
+it)[0],(split)[2]]} keys %hash)
    {
    print FH1_sorting "$key -> $hash{$key}";
    }
    
    close FH1_sorting;
[download]

In reply to perl ST sort performance issue for large file? by rkshyam

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.