Can Schwartzian transform be modified to sort an arrayref uniquely?

corporate_gadfly has asked for the wisdom of the Perl Monks concerning the following question:

O' Wise ones, Can an enlightened soul tell me if the Schwartzian transform can be modified to sort uniquely items inside an arrayref while running under 'use strict'?

My code looks like this (I'm trying to sort uniquely on the field called num):

#!/usr/local/bin/perl -w

use strict;

my $data = (
    [
        {
            num     => 'OF1234',
            title   => 'title OF1234',
        },
        {
            num     => 'AF1234',
            title   => 'title AF1234',
        },
        {
            num     => 'AF1234',
            title   => 'title AF1234',
        },
    ]
);

#print @{$data};

my @sorted =
                # now get the original array ref back
    map { $_->[0] }
                # use field 1 of anonymous array to sort (in this
                # case num)
    sort { $a->[1] cmp $b->[1] }
                # make an anonymous array of itself and num
    map { [$_, $_->{num}] }
                # start with the array ref and make it an array
    @{$data};
print join ', ', map { $_->{num} } @sorted;
print "\n";

my %saw;
undef %saw;

my @unique = grep !$saw{$_->{num}}++, @sorted;
print join ', ', map { $_->{num} } @unique;

print "\n";
exit;
[download]

which produces the desired output (first line sorted, second line unique):

AF1234, AF1234, OF1234
AF1234, OF1234
[download]

Is there a way to get rid of the unsightly grep by using something similar to:

my @keys = sort keys %{ { map { $_ => 1 } @array1 } };
[download]

keeping in mind that we're talking about hash data inside the arrrayref.

I can't get my hands around it. It's a wonder that I understand the Schwartzian Transform to begin with. Here's a detailed explanation of the Schwartzian at http://www.5sigma.com/perl/schwtr.html for people wanting to know more. Thanks in advance,

Comment on Can Schwartzian transform be modified to sort an arrayref uniquely? Select or Download Code

Replies are listed 'Best First'.
(Ovid) Re: Can Schwartzian transform be modified to sort an arrayref uniquely? by Ovid (Cardinal) on Jan 04, 2002 at 21:54 UTC
Well, before we get to your question, there are a few things that need to be cleared up. First, it's good to see you trying out advanced techniques and getting more tools in your box, but you do have a few issues. Your first data structure has parentheses around it. That makes it a comma expression which, when evaluated in list context, will return a list. In scalar context, its value is whatever the last item evaluates to. You only have one item, so your scalar gets the array ref and your code works, but you should avoid doing things like this unless you're positive that this is what you need. Drop the parentheses. The Schwartzian is overkill. A Schwartzian transform is great if you have an expensive step to extract the data and put it in a sortable format. You do not have such a step. Your Schwartzian reduces to this: `my @sorted = sort { $a->{num} cmp $b->{ num } } @$data;` [download] If it makes you feel better, I was caught by the same issue. See Sort on Boredom :) Here's how I shortened your code: `#!/usr/local/bin/perl -w use strict; my $data = [ { num => 'OF1234', title => 'title OF1234', }, { num => 'AF1234', title => 'title AF1234', }, { num => 'AF1234', title => 'title AF1234', }, ]; my %saw; my @sorted = grep { ! $saw{$_->{num}}++ } sort { $a->{num} cmp $b->{ num } } @$data; print join ', ', map { $_->{num} } @sorted;` [download] Hope that helps. Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l] [select]
(tye)Re: Can Schwartzian transform be modified to sort an arrayref uniquely? by tye (Sage) on Jan 04, 2002 at 22:13 UTC
You can also either do the grep first so that the sort will be faster or leave it there and make use of the fact that the data is now sorted to avoid creating a possibly large %saw hash: `my $prev= ""; my @sorted= grep { ( $_->{num} ne $prev, $prev= $_->{num} )[0] } sort { $a->{num} cmp $b->{num} } @$data;` [download] Which doesn't matter for such a small data set, of course. (: - tye (but my friends call me "Tye")	[reply] [d/l]
Re: (tye)Re: Can Schwartzian transform be modified to sort an arrayref uniquely? by khkramer (Scribe) on Jan 04, 2002 at 22:51 UTC
Indeed. There are trade-offs everywhere: by default, I always write `sort {} grep {} @foo;` [download] so that the grep happens first. The speed is more important than the memory for almost everything I do. Here's a little bit of lies/damn-lies/statistics demonstrating the speed difference on an array in which most values appear twice. #!/usr/local/bin/perl -w use Benchmark; use strict; my @data; my %seen; foreach ( 1..100_000 ) { push @data, int($_/2) } fisher_yates_shuffle ( \@data ); #my @gs = gs(); print join ( "\n", @gs ); #my @sg = sg(); print join ( "\n", @sg ); timethese ( 20, { sort_grep => \&sg, grep_sort => \&gs, } ); sub gs { return sort { $a <=> $b } grep { ! $seen{$_} ++ }@data; } sub sg { return grep { ! $seen{$_} ++ } sort { $a <=> $b } @data; } # fisher_yates_shuffle( \@array ) : from Perl Cookbook sub fisher_yates_shuffle { my $array = shift; my $i; for ($i = @$array; --$i; ) { my $j = int rand ($i+1); next if $i == $j; @$array[$i,$j] = @$array[$j,$i]; } } [download] And the output I get: `Benchmark: timing 20 iterations of grep_sort, sort_grep... grep_sort: 5 wallclock secs ( 4.70 usr + 0.02 sys = 4.72 CPU) @ 4 +.24/s (n=20) sort_grep: 16 wallclock secs (15.56 usr + 0.03 sys = 15.59 CPU) @ 1 +.28/s (n=20)` [download] As a side note, I regard this kind of optimization as "habitual", rather than of the premature kind. <laugh>	[reply] [d/l] [select]
Re: (Ovid) Re: Can Schwartzian transform be modified to sort an arrayref uniquely? by gbarr (Monk) on Jan 05, 2002 at 01:24 UTC
One of the "features" of the Schwartzian transform is that you don't have to create any named temporary variables. So instead of using `%saw` or `$prev` you could do `my $data = [ { num => 'OF1234', title => 'title OF1234', }, { num => 'AF1234', title => 'title AF1234', }, { num => 'AF1234', title => 'title AF1234', }, ]; my @sorted = sort { $a->{num} cmp $b->{ num } } values %{ +{ map { ($_->{num},$_) } @$data }}; print join ', ', map { $_->{num} } @sorted;` [download]	[reply] [d/l] [select]
Re: Can Schwartzian transform be modified to sort an arrayref uniquely? by khkramer (Scribe) on Jan 04, 2002 at 21:57 UTC
You don't actually need to use a Schwartzian Transform here, because the operation that is "cached" by the transform (access to the 'num' field of the hash) isn't an expensive operation. You were on the right track with the "%saw" stuff -- you can just prepend (syntactically; postpend operationally <laugh>) a sort: `my %saw; my @uniq = sort { $a->{num} cmp $b->{num} } grep { ! $saw{$_->{num}}++ } @{$data};` [download] That's all you need. And for a situation where you do need to use something like a Schwarzian transform, and you'd like to tighten up the code, you could add the grep to the front (or better, the back -- again, syntactically) of the map/sort/map. (Sometimes it's a little mind-bending, but as you've seen, you can string together the list-manipulation operators to your heart's content.) You can't get rid of the grep entirely because neither map nor sort will "drop" items from a list, only transform or rearrange them.	[reply] [d/l]
Re: Can Schwartzian transform be modified to sort an arrayref uniquely? by petral (Curate) on Jan 04, 2002 at 22:36 UTC
This is probably more than you wanted to know, but it does answer your question directly (how to sort and unique in one step). It's a suggestion MeowChow posted last summer in a discussion of sorts. It uses a hash slice to store the "transform" with the array index and an array slice to return the elements sorted on the transform key. It does not scale well(!). `my %by; @by{ map $_->{num}, @$data } = 0 .. $#$data; @sorted_unique = @{$data}[ @by{ sort keys %by } ];` [download] update: Actually, I now see that the indexing is totally superfluous, so it would be: `my %by = map{ $_->{num} => $_ } @$data; @sorted_unique = @by{sort keys %by};` [download] which actually begins to look reasonable. On the other hand, the sure enough do-it-all-in-one-line with an anonymous hash: `@sorted_unique = @{{ map{ $_->{num} => $_ } @$d }}{ sort map $_->{num}, @$d };` [download] goes through the array twice and also builds a hash. And it even sorts the whole array before selecting unique elements which is the worst way to do it as khkramer demonstrates above. It is kinda neat though. p	[reply] [d/l] [select]