shaolinMonk has asked for the wisdom of the Perl Monks concerning the following question:

I have packed a really large array (200 million doubles) that is packed into a scalar to conserve memory. I'd like to sort the array in its packed form. Is that possible? The problem with sorting the array (as against the packed string) is that it consumes too much memory (about 10x that of the packed scalar).

Replies are listed 'Best First'.
Re: Sorting packed arays
by BrowserUk (Patriarch) on Dec 06, 2008 at 05:02 UTC

    In pure Perl, it'll take ~ 7 hours:

    #! perl -slw use strict; no warnings 'recursion'; $|++; sub qsortpd { my( $ref, $lo, $hi ) = @_; my $pivot = unpack( 'd', substr $$ref, 8*int( ( $lo + $hi ) / 2 ), + 8 ); my( $lb, $rb ) = ( $lo, $hi ); while( 1 ) { $lb++ while unpack( 'd', substr $$ref, 8*$lb, 8 ) < $pivot; $rb-- while unpack( 'd', substr $$ref, 8*$rb, 8 ) > $pivot; $lb++ while $lb != $rb and unpack( 'd', substr $$ref, 8*$lb, 8 ) == unpack( 'd', substr $$ref, 8*$rb, 8 ); last if $lb == $rb; my $temp = substr( $$ref, 8*$lb, 8 ); substr( $$ref, 8*$lb, 8 ) = substr( $$ref, 8*$rb, 8 ); substr( $$ref, 8*$rb, 8 ) = $temp; } qsortpd( $ref, $lo, $lb ) if $lo < --$lb; qsortpd( $ref, $rb, $hi ) if $hi > ++$rb; } our $N ||= 1e6; my $packedDoubles = ''; open RAM, '>', \$packedDoubles; seek RAM, $N*8, 0; print RAM chr(0); seek RAM, 0, 0; printf RAM "%s", pack 'd1000', map{ rand 32767 } 1 .. 1000 for 1 .. $N + / 1000; close RAM; print "Starting sort of $N packed doubles: ", time; qsortpd( \$packedDoubles, 0, $N-1 ); print "Finished sort of $N packed doubles: ", time; <>; print for unpack 'd*', $packedDoubles;

    Using Inline::C, a little over 2 minutes:

    #! perl -slw use strict; use Devel::Peek; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => 'QsortPD', CLEAN_AFTER_BUILD => 0; #include <stdlib.h> #include <string.h> #include <stdio.h> int compare( const void *arg1, const void *arg2 ) { double diff = *(double*)arg1 - *(double*)arg2; return diff < 0 ? -1 : diff > 0 ? 1 : 0; } int qsortPackedDoubles( SV *data, SV *n ) { qsort( (void *)SvPVX( data ), SvUV( n ), sizeof( double ), compare + ); } END_C our $N ||= 1e6; ## Warning N less that 1000 will give funny results my $packedDoubles = ''; ## Preallocate the scalar and random fill with packed doubles open RAM, '>', \$packedDoubles; seek RAM, $N*8, 0; print RAM chr(0); seek RAM, 0, 0; printf RAM "%s", pack 'd1000', map{ rand 32767 } 1 .. 1000 for 1 .. $N + / 1000; close RAM; print "Starting sort of $N packed doubles: ", time; qsortPackedDoubles( $packedDoubles, $N ); print "Finished sort of $N packed doubles: ", time; <>; print for unpack 'd*', $packedDoubles;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sorting packed arays
by kyle (Abbot) on Dec 06, 2008 at 02:58 UTC
      Could you put it into a Tie::Array::Pack or Tie::Array::Packed array and just use sort?

      That does not work as you expect, sort unpacks the array before sorting it.

      Though, Tie::Array::Packed has a sort method that can be used for that. For instance:

      tie my @p, 'Tie::Array::Packed::Integer'; push @p, $_ for (.....); tied(@p)->sort;
      (it just calls Sort::Packed under the hood)
Re: Sorting packed arays
by spmlingam (Scribe) on Dec 06, 2008 at 07:18 UTC
    You can use Sort::Packed module
    . Use mergesort_packed - merge sort is useful for sorting large data
      merge sort is useful for sorting large data

      Well, the merge sort algorithm can be used to sort huge data sets that don't fit in the available RAM, but that doesn't mean that any merge sort implementation does it.

      Actually, as implemented in Sort::Packed (BTW, I am its author), the radixsort uses less memory as it is an inplace operation. The only practical difference between both variants is that merge sort can take advantage of semi sorted data.