I posted a question on EBCDIC sort earlier node id=616606 and got many replies. I tried one code and it worked but I am facing another problem.

I am sorting a file that it more than 400MB size and using multiple keys. While executing, the sort fails with core dump and sometimes illegal instruction message.

I did some investigation and found out that it is due to insufficient memory.

I am running it on AIX server and when I did ulimit -a it gave memory as 65536 bytes. I have unlimited file size permission. I reduced the size of the input file to about 60 K and the sort worked. I have checked and I can't use malloc() or reset it to be used during runtime.

use strict; use Encode qw(encode decode); ### Define the sort key here ### # Sorts in ascending order. sub key1 { ( substr( $a, 3, 17 )) cmp ( substr( $b, 3, 17 )); } # Sorts descending order. sub key2 { ( substr( $b, 20, 2 )) cmp ( substr( $a, 20, 2 )); } # ### Sort processing starts ### my @infile = <>; # Reads file ### Multiple sort keys can be defined and sorted in the order of t +he key my @sorted = map { decode('cp1047', $_) } sort { key1 || key2 } (m +ap { encode('cp1047', $_) } @infile); print @sorted;
I got another snippet of code from earlier post on the same topic for sorting large files. The code is

#!/usr/bin/perl -sw use vars qw/$N/; use strict; use sort "stable"; use Encode qw(encode decode); no strict 'refs'; $|++; sub key1 { ( substr( $a, 3, 17 )) cmp ( substr( $b, 3, 17 )); } # Sorts in descending order. sub key2 { ( substr( $b, 20, 2 )) cmp ( substr( $a, 20, 2 )); } my $reclen = 8072; #! Adjust to suit your records/line ends. $N = $N || 1; warn "Usage: $0 [-N=n] file\n" and exit(-1) unless @ARGV; warn "Reading input file $ARGV[0] ", -s $ARGV[0], "\n"; if ( not defined $ARGV[1] ) { warn "Output file not specified a Continue[N|y]?"; exit -1 if <STDIN> !~ /^Y/i; } $/= \$reclen; open INPUT, '<', $ARGV[0] or die $!, $ARGV[0]; binmode(INPUT); my (@fhs); while ( <INPUT> ) { my $key = substr($_, 3, $::N); if (not defined $fhs[$key]) { $fhs[$key] = "temp.$key"; warn( "\rCreating file: $fhs[$key] "); open( $fhs[$key], ">$fhs[$key]") or die( "Could create $fhs[$key]: $!"); binmode($fhs[$key]); } print {$fhs[$key]} $_; } #! Get rid of unused filehandles or those that reference zero le +ngth file @fhs = grep{ $_ and ! -z $_} @fhs; close $_ for @fhs; close INPUT; warn "Split made to: ", scalar @fhs, " files\n"; #! Sort the split files on the first & second field for my $fh (@fhs) { warn "$fh: reading;..."; open $fh, "<$fh" or die $!; binmode($fh); my @recs = <$fh>; close $fh; warn " sorting: ", scalar @recs, " recs;..."; # @recs = sort{ substr($a, 3, 16) cmp substr($b, 3, 16) # || substr($b, 20, 3) cmp substr($a, 20, 3) } @rec +s; my @recs = map { decode('cp1047', $_) } sort { key1 || key2 } (map { encode('cp1047', $_) } @recs); warn " writing;..."; open $fh, ">$fh" or die $!; binmode($fh); print $fh @recs; close $fh; warn "done;\n"; } warn "Merging files: "; *SORTED = *STDOUT; open SORTED, '>', $ARGV[1] and binmode(SORTED) or die $! if $ARG +V[1]; for my $fh (reverse @fhs) { warn " $fh;"; open $fh, "<$fh" and binmode($fh) or die $!; print SORTED <$fh>; close $fh; } warn "\nClosing sorted file: sorted\n"; close SORTED; warn "Deleting temp files\n"; unlink $_ or warn "Couldn't unlink $_\n" for @fhs; warn "Done.\n"; exit (0);
I no longer face memory problem but the it outputs less number of records and the sort seems not to be working

A couple of sample record enclosed

SK 1242 0180010100 AAR CPH AAR 0735 CPH 0810001 20070521200705211 SK 1242

SK 1242 0190010100 AAR CPH AAR 0735 CPH 0810001 2007052699999999 6 SK 1242


In reply to Sort large files by ramish

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.