Hi I have performance problem with perl sort. The logic what I am currently using is creating a hash, getting the key value pair and then sorting based on ST sort technique for the fields date and time. The normal input filesize for sorting is around 3-4GB I have posted the code that is currently working but is taking around 3 hrs to complete the sort process on a 12GHz memory 64 bit windows system. (If I use the same script/technique on 32 bit windows system with 4GB RAM, it is resulting in out of memory error) The actual requirement is to sort this file and then split into no of files as 3 GB file could not be opened. The file splitting section is working appropriately. Please help if performance can be improved and out of memory issue could be resolved. Thanks in advance. Any help on this is greatly appreciated.

The sample input file content is: 2012/02/12 @ 14:29:26,519 @ -> java.lang.NullPointerException 2012/02/12 @ 14:23:26,519 @ -> | WARN | RMI TCP Connection(184923)- +170.80.0.9 | Error in getting the Network Adapter 2012/02/12 @ 14:20:26,522 @ -> | WARN | RMI TCP Connection(184923)- +170.80.0.9 | Error in getting the Network Adapter and output should look like: 2012/02/12 @ 14:20:26,522 @ -> | WARN | RMI TCP Connection(184923)- +170.80.0.9 | Error in getting the Network Adapter 2012/02/12 @ 14:23:26,519 @ -> | WARN | RMI TCP Connection(184923)- +170.80.0.9 | Error in getting the Network Adapter 2012/02/12 @ 14:29:26,519 @ -> java.lang.NullPointerException
open FH_duplicate, "$file_duplicate" or die "$!"; open FH1_sorting, ">>$file_consolidated_sort" or die "$!"; my %hash = (); my $key; my $val; while(<FH_duplicate>) { chomp; ($key,$val)=split(/,,/); $hash{$key} .= $val; } close FH_duplicate; ### hash creation ### sorting begins for $key(map{$_ -> [0]} sort{ $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2]} map{[$_,(spl +it)[0],(split)[2]]} keys %hash) { print FH1_sorting "$key -> $hash{$key}"; } close FH1_sorting;

In reply to perl ST sort performance issue for large file? by rkshyam

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.