in reply to sorting an array of hashes and removing duplicates

use strict; use warnings; ... my $sql = qq| SELECT ID, tutorID, Latitude, Longitude, City, State, Zipcode FROM $table_tzips WHERE Zipcode IS not NULL |; my $sth = $dbh->prepare($sql); my %closest; $sth->execute(); while (my $tzips = $sth->fetchrow_hashref()) { my $tutorID = $tzips->{tutorID}; my $dist = calculate_distance( $clong, $clat, $tzips->{Longitude}, $tzips->{Latitude} ); next if $closest{$tutorID} && $closest{$tutorID}{dist} <= $dist; $closest{$tutorID} = { ID => $tutorID, tzipID => $tzips->{ID}, Dist => $dist, City => $tzips->{City}, State => $tzips->{State}, Zipcode => $tzips->{Zipcode}, }; } my @report_fields = qw( tzipID ID Dist City State Zipcode ); for my $tutor ( sort{ $a->{dist} <=> $b{dist} } values(%closest) ) { print(join(', ', @{$tutor}{@report_fields}), "\n"); }

Replies are listed 'Best First'.
Re^2: sorting an array of hashes and removing duplicates
by salva (Canon) on Apr 02, 2010 at 07:36 UTC
    or ...
    my $sql = qq| SELECT ID, tutorID, Latitude, Longitude, City, State, Zipcode FROM $table_tzips WHERE Zipcode IS not NULL ORDER BY tutorID |; my $last_tutor_id = ""; my $last_dist; my @tutors; my $sth = $dbh->prepare($sql); $sth->execute(); while (my $tzips = $sth->fetchrow_hashref()) { my $tutor_id = $tzips->{tutorID}; my $dist = calculate_distance( $clong, $clat, $tzips->{Longitude}, $tzips->{Latitude} ); if ($last_tutor_id ne $tutor_id) { $last_tutor_id = $tutor_id; push @tutors, $tzips; } else { next if $last_dist <= $dist; $tutors[-1] = $tzips; } $tzips->{Dist} = $last_dist = $dist; } my @report_fields = qw( tzipID ID Dist City State Zipcode ); for my $tutor ( sort{ $a->{Dist} <=> $b{Dist} } @tutors) ) { print(join(', ', @{$tutor}{@report_fields}), "\n"); }
    In the OP case, where the full data has to be keep in memory for the final sorting, this approach may be just more complicated than ikegami's one and also imposes an extra load in the database.

    But when you can output the data as you go and don't need to keep it in memory for a final processing stage, it may perform better, specially for large data sets, as it's memory usage is fixed O(1), not dependant on the data set size O(N).