maxon11 has asked for the wisdom of the Perl Monks concerning the following question:
... DEAR PERL MONKS , I am a newbie to PERL who has kind of dove into the deep end of the pool and have done so without fully reading up on all PERL books and materials.. But I think my questions are pretty serious and should be easy for a PERL veteran like yourself to answer --- below is a LOG of a Problem report I submitted today regarding a SORT pm module called: File-Sort I found on the web using google at CPAN. I am using the sort.pudge file to get around not being able to install the module myself to @INC which is locked down at my site. File-Sort version 1.00 or 1.01 ????
file in question is 400, 987 records h2z.prod.400987aa with a record length of 103 and a sample record looks like this:This is perl, version 5.005_03 built for aix AIX pbsxdr00041 2 5 000258CA4C00 >perl sort.pudge h2z.prod.400987aa > h2z.prod1a Use of uninitialized value at sort.pudge line 266.
when i run it like this i get a different error msg:20040623CC255533 3130 268713110 3 20 +040623 8.96 1.000
Is the program truly out of memory or is this a bug in the package or in the implementation or in the module itself?? This code which seems to be doing the same thing is only 10 lines long and can work on a file 3 times this size that is 90mb large.. when all other sort perl programs I could find run out of memory like this one:>perl -w sort.pudge -d h2z.prod.400987aa > h2z.prod1a Out of memory!
"perl_sort.pl" 7 lines, 301 characters #/usr/bin/perl open(MYINPUTFILE, "<h2z_zdl0.000"); # open="open" for="for" input="inp +ut" my(@lines)="<MYINPUTFILE">; # read file into list @lines = sort(@lines); # sort the list my($line);foreach $line (@lines) # loop thru list { print "$line"; # print in sort order } close(MYINPUTFILE);
++++++++++++++++++++++++++++++++THIS PROGRAM WORKS THOUGH --- why?????/opt/vrs/files/PROD_8 PBN2I] >perl -w perl_sort.pl Out of memory!
#!/usr/bin/perl open (ORIGFILE, "H2Z_ZDL0.000"); open (FINALFILE, ">sorted.txt"); print FINALFILE sort(<origfile>); close (FINALFILE); close (ORIGFILE);
>>>>> "Andrew" == McGrath, Andrew {PBG} <andrew.mcgrath@pepsi.com> writes: Andrew> #!/usr/bin/perl Andrew> open (ORIGFILE, Andrew> "H2Z_ZDL0.000"); Andrew> open (FINALFILE, ">sorted.txt"); Andrew> print FINALFILE <<-- what is this doing Andrew> sort(<origfile>); <<-- what is this doing Andrew> close (FINALFILE); Andrew> close (ORIGFILE);perldoc of the sort function in readmore
$ perldoc -t -f sort sort SUBNAME LIST sort BLOCK LIST sort LIST In list context, this sorts the LIST and returns the sorted + list value. In scalar context, the behaviour of "sort()" is undefined. If SUBNAME or BLOCK is omitted, "sort"s in standard string comparison order. If SUBNAME is specified, it gives the nam +e of a subroutine that returns an integer less than, equal to, o +r greater than 0, depending on how the elements of the list a +re to be ordered. (The "<=>" and "cmp" operators are extremely us +eful in such routines.) SUBNAME may be a scalar variable name (unsubscripted), in which case the value provides the name +of (or a reference to) the actual subroutine to use. In place +of a SUBNAME, you can provide a BLOCK as an anonymous, in-line s +ort subroutine. If the subroutine's prototype is "($$)", the elements to be compared are passed by reference in @_, as for a normal subroutine. This is slower than unprototyped subroutines, w +here the elements to be compared are passed into the subroutine +as the package global variables $a and $b (see example below). + Note that in the latter case, it is usually counter-productive t +o declare $a and $b as lexicals. In either case, the subroutine may not be recursive. The va +lues to be compared are always passed by reference, so don't mod +ify them. You also cannot exit out of the sort block or subroutine us +ing any of the loop control operators described in perlsyn or w +ith "goto". When "use locale" is in effect, "sort LIST" sorts LIST acco +rding to the current collation locale. See perllocale. Perl 5.6 and earlier used a quicksort algorithm to implemen +t sort. That algorithm was not stable, and *could* go quadrat +ic. (A *stable* sort preserves the input order of elements that compare equal. Although quicksort's run time is O(NlogN) wh +en averaged over all arrays of length N, the time can be O(N** +2), *quadratic* behavior, for some inputs.) In 5.7, the quickso +rt implementation was replaced with a stable mergesort algorit +hm whose worst case behavior is O(NlogN). But benchmarks indic +ated that for some inputs, on some platforms, the original quick +sort was faster. 5.8 has a sort pragma for limited control of th +e sort. Its rather blunt control of the underlying algorithm +may not persist into future perls, but the ability to character +ize the input or output in implementation independent ways quit +e probably will. See sort. Examples: # sort lexically @articles = sort @files; # same thing, but with explicit sort routine @articles = sort {$a cmp $b} @files; # now case-insensitively @articles = sort {uc($a) cmp uc($b)} @files; # same thing in reversed order @articles = sort {$b cmp $a} @files; # sort numerically ascending @articles = sort {$a <=> $b} @files; # sort numerically descending @articles = sort {$b <=> $a} @files; # this sorts the %age hash by value instead of key # using an in-line function @eldest = sort { $age{$b} <=> $age{$a} } keys %age; # sort using explicit subroutine name sub byage { $age{$a} <=> $age{$b}; # presuming numeric } @sortedclass = sort byage @class; sub backwards { $b cmp $a } @harry = qw(dog cat x Cain Abel); @george = qw(gone chased yz Punished Axed); print sort @harry; # prints AbelCaincatdogx print sort backwards @harry; # prints xdogcatCainAbel print sort @george, 'to', @harry; # prints AbelAxedCainPunishedcatchaseddoggoneto +xyz # inefficiently sort by descending numeric compare usin +g # the first integer after the first = sign, or the # whole record case-insensitively otherwise @new = sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || uc($a) cmp uc($b) } @old; # same thing, but much more efficiently; # we'll build auxiliary indices instead # for speed @nums = @caps = (); for (@old) { push @nums, /=(\d+)/; push @caps, uc($_); } @new = @old[ sort { $nums[$b] <=> $nums[$a] || $caps[$a] cmp $caps[$b] } 0..$#old ]; # same thing, but without any temps @new = map { $_->[0] } sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] } map { [$_, /=(\d+)/, uc($_)] } @old; # using a prototype allows you to use any comparison su +broutine # as a sort subroutine (including other package's subro +utines) package other; sub backwards ($$) { $_[1] cmp $_[0]; } # $a and $b + are not \ set here package main; @new = sort other::backwards @old; # guarantee stability, regardless of algorithm use sort 'stable'; @new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @ +old; # force use of mergesort (not portable outside Perl 5.8 +) use sort '_mergesort'; # note discouraging _ @new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @ +old; If you're using strict, you *must not* declare $a and $b as lexicals. They are package globals. That means if you're in + the "main" package and type @articles = sort {$b <=> $a} @files; then $a and $b are $main::a and $main::b (or $::a and $::b) +, but if you're in the "FooPack" package, it's the same as typing @articles = sort {$FooPack::b <=> $FooPack::a} @files; The comparison function is required to behave. If it return +s inconsistent results (sometimes saying $x[1] is less than $ +x[2] and sometimes saying the opposite, for example) the results + are not well-defined. Because "<=>" returns "undef" when either operand is "NaN" (not-a-number), and because "sort" will trigger a fatal err +or unless the result of a comparison is defined, when sorting +with a comparison function like "$a <=> $b", be careful about li +sts that might contain a "NaN". The following example takes advantage of the fact that "NaN != NaN" to eliminate any "N +aN"s from the input. @result = sort { $a <=> $b } grep { $_ == $_ } @input;
Andrew> How does perl know to default sort the whole record which is Andrew> 103 chars in length in alphabetical order and how does it do Andrew> so without a complicated sort function hand-written for the Andrew> file in question to assist it?? Read "Learning Perl". Andrew> IS this program using main memory or some other temp file Andrew> trick or pipes ??? because it did not fail when sorting a Andrew> 90mb file but I tried 4 other perl programs off the web and Andrew> numerous combos of unix sort/exec/xarg and got nowhere at all Andrew> !!! Can I get a moment of your time and a simple set of Andrew> answers here .... please. Thank You Again! Andrew McGrath It's all in memory.--
Andrew, perl coder in training. P.S. My 1st love in Unix and Korn shell but I am determined to rise to a high-level of proficiency with Perl before 2004 closes out on ,me in order to advance my career, knowledge, skills and most of all to save me time and make me more efficient and better at what I do .. which is being a 7X24Unix BackEnd Support Engineer for a 1.6 terabyte Oracle -Vendor driven Selling Database at PBG.
Andrew M. McGrath CPS IT SUPPORT PBG - SOMERS NY "Always Moving Forward"
Edit by castaway - hide html headers, fixed up formatting, restore email content without signature, replace long lines of stars with hrs
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: How to sort a large flat file 90mb with PERL-- various ways/tradeoffs/watchouts
by delirium (Chaplain) on Jun 30, 2004 at 01:36 UTC | |
Re: How to sort a large flat file 90mb with PERL-- various ways/tradeoffs/watchouts
by tachyon (Chancellor) on Jun 30, 2004 at 02:44 UTC | |
by theorbtwo (Prior) on Jun 30, 2004 at 07:16 UTC | |
Re: How to sort a large flat file 90mb with PERL-- various ways/tradeoffs/watchouts
by jarich (Curate) on Jun 30, 2004 at 09:47 UTC |