ramish has asked for the wisdom of the Perl Monks concerning the following question:
I am sorting a file that it more than 400MB size and using multiple keys. While executing, the sort fails with core dump and sometimes illegal instruction message.
I did some investigation and found out that it is due to insufficient memory.
I am running it on AIX server and when I did ulimit -a it gave memory as 65536 bytes. I have unlimited file size permission. I reduced the size of the input file to about 60 K and the sort worked. I have checked and I can't use malloc() or reset it to be used during runtime.
I got another snippet of code from earlier post on the same topic for sorting large files. The code isuse strict; use Encode qw(encode decode); ### Define the sort key here ### # Sorts in ascending order. sub key1 { ( substr( $a, 3, 17 )) cmp ( substr( $b, 3, 17 )); } # Sorts descending order. sub key2 { ( substr( $b, 20, 2 )) cmp ( substr( $a, 20, 2 )); } # ### Sort processing starts ### my @infile = <>; # Reads file ### Multiple sort keys can be defined and sorted in the order of t +he key my @sorted = map { decode('cp1047', $_) } sort { key1 || key2 } (m +ap { encode('cp1047', $_) } @infile); print @sorted;
I no longer face memory problem but the it outputs less number of records and the sort seems not to be working#!/usr/bin/perl -sw use vars qw/$N/; use strict; use sort "stable"; use Encode qw(encode decode); no strict 'refs'; $|++; sub key1 { ( substr( $a, 3, 17 )) cmp ( substr( $b, 3, 17 )); } # Sorts in descending order. sub key2 { ( substr( $b, 20, 2 )) cmp ( substr( $a, 20, 2 )); } my $reclen = 8072; #! Adjust to suit your records/line ends. $N = $N || 1; warn "Usage: $0 [-N=n] file\n" and exit(-1) unless @ARGV; warn "Reading input file $ARGV[0] ", -s $ARGV[0], "\n"; if ( not defined $ARGV[1] ) { warn "Output file not specified a Continue[N|y]?"; exit -1 if <STDIN> !~ /^Y/i; } $/= \$reclen; open INPUT, '<', $ARGV[0] or die $!, $ARGV[0]; binmode(INPUT); my (@fhs); while ( <INPUT> ) { my $key = substr($_, 3, $::N); if (not defined $fhs[$key]) { $fhs[$key] = "temp.$key"; warn( "\rCreating file: $fhs[$key] "); open( $fhs[$key], ">$fhs[$key]") or die( "Could create $fhs[$key]: $!"); binmode($fhs[$key]); } print {$fhs[$key]} $_; } #! Get rid of unused filehandles or those that reference zero le +ngth file @fhs = grep{ $_ and ! -z $_} @fhs; close $_ for @fhs; close INPUT; warn "Split made to: ", scalar @fhs, " files\n"; #! Sort the split files on the first & second field for my $fh (@fhs) { warn "$fh: reading;..."; open $fh, "<$fh" or die $!; binmode($fh); my @recs = <$fh>; close $fh; warn " sorting: ", scalar @recs, " recs;..."; # @recs = sort{ substr($a, 3, 16) cmp substr($b, 3, 16) # || substr($b, 20, 3) cmp substr($a, 20, 3) } @rec +s; my @recs = map { decode('cp1047', $_) } sort { key1 || key2 } (map { encode('cp1047', $_) } @recs); warn " writing;..."; open $fh, ">$fh" or die $!; binmode($fh); print $fh @recs; close $fh; warn "done;\n"; } warn "Merging files: "; *SORTED = *STDOUT; open SORTED, '>', $ARGV[1] and binmode(SORTED) or die $! if $ARG +V[1]; for my $fh (reverse @fhs) { warn " $fh;"; open $fh, "<$fh" and binmode($fh) or die $!; print SORTED <$fh>; close $fh; } warn "\nClosing sorted file: sorted\n"; close SORTED; warn "Deleting temp files\n"; unlink $_ or warn "Couldn't unlink $_\n" for @fhs; warn "Done.\n"; exit (0);
A couple of sample record enclosed
SK 1242 0180010100 AAR CPH AAR 0735 CPH 0810001 20070521200705211 SK 1242
SK 1242 0190010100 AAR CPH AAR 0735 CPH 0810001 2007052699999999 6 SK 1242
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Sort large files
by shmem (Chancellor) on Jun 06, 2007 at 13:28 UTC | |
|
Re: Sort large files
by Moron (Curate) on Jun 06, 2007 at 11:51 UTC | |
by ramish (Initiate) on Jun 06, 2007 at 15:01 UTC | |
|
Re: Sort large files
by zentara (Cardinal) on Jun 06, 2007 at 12:15 UTC | |
|
Re: Sort large files
by salva (Canon) on Jun 06, 2007 at 12:28 UTC | |
by Moron (Curate) on Jun 06, 2007 at 12:55 UTC | |
by salva (Canon) on Jun 06, 2007 at 15:19 UTC | |
by Moron (Curate) on Jun 06, 2007 at 17:57 UTC | |
|
Re: Sort large files
by andreas1234567 (Vicar) on Jun 06, 2007 at 13:42 UTC | |
|
Re: Sort large files (rep)
by tye (Sage) on Jun 06, 2007 at 14:56 UTC | |
|
Re: Sort large files
by swampyankee (Parson) on Jun 06, 2007 at 12:23 UTC | |
|
Re: Sort large files
by Anonymous Monk on Jun 06, 2007 at 13:14 UTC |