Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've downloaded from cpan and installed creamyg's Sort::External module, but it does not seem to be working. Here is the behavior:

1. I created a small unsorted list of data to be sorted.

2. I created my own sort subroutine.

3. Ran the sort with the following settings:

my $sortex = Sort::External->new( -mem_threshold => 2**26, -sortsub => $sortscheme, -working_dir => $temp_directory,);

and it properly sorted my data.

4. Reran the sort with -mem-threshold set to 2**20, forcing the use of temp files and the routine went into an endless loop.

It seems that as soon as the disk temporary file are used, the sort gets stuck in a loop.

Thanks,

--Doug Roberts

doug@parrot-farm.net

Replies are listed 'Best First'.
Re: Problem with Sort::External 0.16
by tilly (Archbishop) on Jul 12, 2008 at 06:04 UTC
    The bug in the module is that it is not prepared to handle a sort routine that can receive 2 distinct arguments and return 0. If you enable a last comparison in your sort routine that just does $Sort::External::a cmp $Sort::External::b then you won't get an endless loop. Alternately, assuming that Perl is using a stable sort (it does by default I believe, but that's not a safe default) you can fix the bug with the following patch to Sort/External.pm:
    --- External.pm.bak 2008-07-11 22:30:52.000000000 -0700 +++ External.pm 2008-07-11 22:52:14.000000000 -0700 @@ -322,7 +322,7 @@ $comparecode = sub { my $test = shift; return 0 if $test eq $cutoff; - my $winner = ( sort $sortsub( $cutoff, $test ) )[0]; + my $winner = ( sort $sortsub( $test, $cutoff ) )[0]; return $winner eq $cutoff ? 1 : -1; }; }
    While I'm at it I should mention that you have a lot of unnecessary checks in your comparison. You're writing
    # .... return -1 if $hours1 < $hours2; return 1 if $hours1 > $hours2; return -1 if $hours1 == $hours2 && $minutes1 < $minutes2; return 1 if $hours1 == $hours2 && $minutes1 > $minutes2; return -1 if $hours1 == $hours2 && $minutes1 == $minutes2 && $sec +onds1 < $seconds2; return 1 if $hours1 == $hours2 && $minutes1 == $minutes2 && $seco +nds1 > $seconds2;
    but if $hours1 and $hours2 are different then you'll never get to the minutes comparison. So you can get rid of those and just have:
    # ... return -1 if $hours1 < $hours2; return 1 if $hours1 > $hours2; return -1 if $minutes1 < $minutes2; return 1 if $minutes1 > $minutes2; return -1 if $seconds1 < $seconds2; return 1 if $seconds1 > $seconds2;
    But you can write this more idiomatically using the fact that || short circuits:
    return $hours1 <=> $hours2 || $minutes1 <=> $minutes2 || $seconds1 <=> $seconds2 || $Sort::External::a cmp $Sort::External::b;
    Incidentally in one of the few useful uses of prototypes in Perl, if a sort sub has a prototype of ($$) then the arguments are passed in as @_ and not $a, $b. So if you write
    my $sortscheme = sub ($$) { my @fieldsa = split(" ", shift); my @fieldsb = split(" ", shift); # ... };
    then you don't have to worry about what package the sort is coming from.

    Oh and a final tip? You should use strict.pm. Always.

      tilly: The bug in the module is that it is not prepared to handle a sort routine that can receive 2 distinct arguments and return 0.

      This problem should be resolved in Sort::External version 0.17. The $comparecode closure which you provided a band-aid for has been superseded by a private _compare() method, which should be able to handle both those sortsubs which use $Sort::External::a and $Sort::External::b and those which use positional arguments:

      # Compare two elements using either standard lexical comparison or the + sortsub # provided to the object's constructor. sub _compare { my ( $self, $item_a, $item_b ) = @_; my $sortsub = $self->_get_sortsub; if ( defined $sortsub ) { local $a = $item_a; local $b = $item_b; return $sortsub->( $a, $b ); } else { return $item_a cmp $item_b; } }
      --
      Marvin Humphrey
      Rectangular Research ― http://www.rectangular.com

      Nice work, Tilly -- thanks! One of us should package the patched External.pm up into a new version of Sort::External and send it off to the folks at cpan to replace the current version. Let me know if you want to do it, or if you'd rather that I did.

      --Doug

      doug@parrot-farm.net

        Random people cannot replace random modules on CPAN. Allowing that would be a serious security hole. Instead the module author has to do it.

        That is why I CCed the module author in my email to you.

Re: Problem with Sort::External 0.16
by Anonymous Monk on Jul 12, 2008 at 01:47 UTC
    sample data and code can be found at the bug report

    you should have mentioned this yourself doug

      My bad. Would have done so, had the perlmonks site "Create New User" function not been broken. I've been waiting for my new account password for 6 hours now. --Doug
Re: Problem with Sort::External 0.16
by Khen1950fx (Canon) on Jul 12, 2008 at 05:53 UTC
    I've tried all the versions of Sort::External, and none of them work. Did you notice the error message "Sort::External.pm has too many errors"? I've had some good results with File::Sort.
      As you can see below, tilly fixed the bug in Sort::External.

      BTW, the reason that I cannot use File::Sort is that I am dealing with data sets that are too large to fit into memory.

      --Doug

        tilly did a great job. The patch helped. Now, I hope that the module's author will take notice and fix it:).