in reply to Re: perl ST sort performance issue for large file?
in thread perl ST sort performance issue for large file?

Just an update on this: As of now I am trying with the sort option as mentioned by salva. for my $key(sort keys %hash) { print FH1_sorting "$key -> $hash{$key}"; } This is working.But the problem here is, it is consuming all of system +s memory 12GB and even after sort, the script is not getting terminat +ed, but shows memory consumption at 99% for an indefinite amount of t +ime. I have not looked at the GRT sort yet. Is there any solution for this memory consumption of 99%? Or should I to concentrate on GRT sort? Please advise

Replies are listed 'Best First'.
Re^3: perl ST sort performance issue for large file?
by BrowserUk (Patriarch) on Mar 31, 2012 at 09:47 UTC

    It would be far easier to advise you if you would post the code you are using.

    Often, the most innocent looking pieces of code conceal things that unnecessarily consume memory.

    For example, the convenient: my @array = <$fh>; uses twice as much memory as:

    my @array; my $n = 0; $array[ $n++ ] = $_ while <$fh>;

    Because in the first version, <$fh> first creates a list of the lines on the stack, which are then assigned to the array. For a breif time, you have two copies of the entire file in memory, with the obvious increase in total memory requirement.

    In the second version, the array is populated directly thus avoiding that problem.

    For the majority of uses, the first form is convenient and not a problem, but when you are butting your head against the capacity of your hardware, the change is worth the effort.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Here is the modified code to my earlier code.(Earlier also I have post +ed the code and as per perl monk i have updated and posted here) use strict; use warnings; use POSIX my $file_duplicate="log_duplicate_remove"; my $file_consolidated_sort="log_consolidated_sort"; open FH_duplicate, "$file_duplicate" or die "$!"; ### input file open FH_sorting, ">>$file_consolidated_sort" or die "$!"; #### out +put file my %hash = (); my $key; my $val; while(<FH_duplicate>) { chomp; ($key,$val)=split(/,,/); $hash{$key} .= $val; } close FH_duplicate; for $key(sort keys %hash) { print FH_sorting "$key -> $hash{$key}"; } close FH_sorting;
Re^3: perl ST sort performance issue for large file?
by salva (Canon) on Mar 31, 2012 at 10:26 UTC
    then go for an external sort and do the duplicate consolidation as a postprocessing:
    open my $fh, '-|', 'sort', $filename or die; chomp(my $line = <$fh>); my ($prev_key, $prev_value) = split /,,/, $line; while (<$fh>) { chomp; my ($key, $value) = split /,,/; if ($prev_key eq $key) { $prev_value .= $value; } else { print "$prev_key,,$prev_value\n"; ($prev_key, $prev_value) = ($key, $value); } } print "$prev_key,,$prev_value\n";

    Read the manual page for your OS sort utility in order to find out how to optimize its usage.

    Or you could also use Sort::External.

      I am trying to run the code what you have mentioned. But this is resulting in "List form of pipe open not implemented at... +sort.pl line 19"(on my windows system).I came to know that "List form + of pipe open "is not supported on windows system. and I changed it t +o open $fh, "|sort $filename", or die; and the entire code looks like this: use strict; use warnings; my $filename = "sort_input"; my $fh; my $line; my $key; my $value; my $prev_key; my $prev_value; my $file_consolidated_sort= "sort_output"; my $FH_sorting; #open $fh, '-|', 'sort', $filename or die; open $FH_sorting, ">>$file_consolidated_sort" or die "$!"; open $fh, "|sort $filename", or die; chomp( $line = <$fh> ); ($prev_key, $prev_value) = split /,,/, $line; while (<$fh>) { chomp; ($key, $value) = split /,,/; if ($prev_key eq $key) { $prev_value .= $value; } else { print $FH_sorting $prev_key ,, $prev_value; ($prev_key, $prev_value) = ($key, $value); } } print $FH_sorting $prev_key,,$prev_value; close $fh; close $FH_sorting; Another point is that I am not able to write the resulted output to th +e output file handler.All sorted output is getting printed on the scr +een itself with the warning/error on screen "Use of uninitialized val +ue $line in chomp at line 25, 26 etc". I have not tried sorting with huge file(3GB) which is my actual requir +ement and have not tested for performance will improve or not. Parelelly I am looking into external sort