(tye)Re: Can Schwartzian transform be modified to sort an arrayref uniquely?

You can also either do the grep first so that the sort will be faster or leave it there and make use of the fact that the data is now sorted to avoid creating a possibly large %saw hash:

my $prev= "";
my @sorted=
    grep { ( $_->{num} ne $prev, $prev= $_->{num} )[0] }
    sort { $a->{num} cmp $b->{num} }
    @$data;
[download]

Which doesn't matter for such a small data set, of course. (:

- tye (but my friends call me "Tye")

Comment on (tye)Re: Can Schwartzian transform be modified to sort an arrayref uniquely? Download Code

Replies are listed 'Best First'.
Re: (tye)Re: Can Schwartzian transform be modified to sort an arrayref uniquely? by khkramer (Scribe) on Jan 04, 2002 at 22:51 UTC
Indeed. There are trade-offs everywhere: by default, I always write `sort {} grep {} @foo;` [download] so that the grep happens first. The speed is more important than the memory for almost everything I do. Here's a little bit of lies/damn-lies/statistics demonstrating the speed difference on an array in which most values appear twice. #!/usr/local/bin/perl -w use Benchmark; use strict; my @data; my %seen; foreach ( 1..100_000 ) { push @data, int($_/2) } fisher_yates_shuffle ( \@data ); #my @gs = gs(); print join ( "\n", @gs ); #my @sg = sg(); print join ( "\n", @sg ); timethese ( 20, { sort_grep => \&sg, grep_sort => \&gs, } ); sub gs { return sort { $a <=> $b } grep { ! $seen{$_} ++ }@data; } sub sg { return grep { ! $seen{$_} ++ } sort { $a <=> $b } @data; } # fisher_yates_shuffle( \@array ) : from Perl Cookbook sub fisher_yates_shuffle { my $array = shift; my $i; for ($i = @$array; --$i; ) { my $j = int rand ($i+1); next if $i == $j; @$array[$i,$j] = @$array[$j,$i]; } } [download] And the output I get: `Benchmark: timing 20 iterations of grep_sort, sort_grep... grep_sort: 5 wallclock secs ( 4.70 usr + 0.02 sys = 4.72 CPU) @ 4 +.24/s (n=20) sort_grep: 16 wallclock secs (15.56 usr + 0.03 sys = 15.59 CPU) @ 1 +.28/s (n=20)` [download] As a side note, I regard this kind of optimization as "habitual", rather than of the premature kind. <laugh>	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: (tye)Re: Can Schwartzian transform be modified to sort an arrayref uniquely?
by khkramer (Scribe) on Jan 04, 2002 at 22:51 UTC

Indeed. There are trade-offs everywhere: by default, I always write

 sort {} grep {} @foo;
[download]

so that the grep happens first. The speed is more important than the memory for almost everything I do.

Here's a little bit of lies/damn-lies/statistics demonstrating the speed difference on an array in which most values appear twice.

#!/usr/local/bin/perl -w

use Benchmark;
use strict;

my @data;
my %seen;
foreach ( 1..100_000 ) { push @data, int($_/2) }
fisher_yates_shuffle ( \@data );

#my @gs = gs(); print join ( "\n", @gs );
#my @sg = sg(); print join ( "\n", @sg );

timethese ( 20, { sort_grep => \&sg,
                  grep_sort => \&gs,
} );

sub gs {
  return sort { $a <=> $b }
    grep { ! $seen{$_} ++ }@data;
}

sub sg {
  return grep { ! $seen{$_} ++ }
    sort { $a <=> $b } @data;
}

# fisher_yates_shuffle( \@array ) : from Perl Cookbook
sub fisher_yates_shuffle {
    my $array = shift;
    my $i;
    for ($i = @$array; --$i; ) {
        my $j = int rand ($i+1);
        next if $i == $j;
        @$array[$i,$j] = @$array[$j,$i];
    }
}
[download]

And the output I get:

Benchmark: timing 20 iterations of grep_sort, sort_grep...
 grep_sort:  5 wallclock secs ( 4.70 usr +  0.02 sys =  4.72 CPU) @  4
+.24/s (n=20)
 sort_grep: 16 wallclock secs (15.56 usr +  0.03 sys = 15.59 CPU) @  1
+.28/s (n=20)
[download]

As a side note, I regard this kind of optimization as "habitual", rather than of the premature kind. <laugh>

[reply]
[d/l]
[select]