Re: sorting an array of hashes and removing duplicates

use strict;
use warnings;

...

my $sql = qq|
    SELECT ID, tutorID, Latitude, Longitude, City, State, Zipcode
      FROM $table_tzips
     WHERE Zipcode IS not NULL
|;

my $sth = $dbh->prepare($sql);

my %closest;
$sth->execute();
while (my $tzips = $sth->fetchrow_hashref()) {
    my $tutorID = $tzips->{tutorID};
    my $dist = calculate_distance(
        $clong, $clat,
        $tzips->{Longitude}, $tzips->{Latitude} );

    next
        if $closest{$tutorID}
        && $closest{$tutorID}{dist} <= $dist;

    $closest{$tutorID} = {
        ID      => $tutorID,
        tzipID  => $tzips->{ID},
        Dist    => $dist,
        City    => $tzips->{City},
        State   => $tzips->{State},
        Zipcode => $tzips->{Zipcode},
    };
}

my @report_fields = qw( tzipID ID Dist City State Zipcode );

for my $tutor ( sort{ $a->{dist} <=> $b{dist} } values(%closest) ) {
    print(join(', ', @{$tutor}{@report_fields}), "\n");
}
[download]

Comment on Re: sorting an array of hashes and removing duplicates Download Code

Replies are listed 'Best First'.
Re^2: sorting an array of hashes and removing duplicates by salva (Canon) on Apr 02, 2010 at 07:36 UTC
or ... my $sql = qq\| SELECT ID, tutorID, Latitude, Longitude, City, State, Zipcode FROM $table_tzips WHERE Zipcode IS not NULL ORDER BY tutorID \|; my $last_tutor_id = ""; my $last_dist; my @tutors; my $sth = $dbh->prepare($sql); $sth->execute(); while (my $tzips = $sth->fetchrow_hashref()) { my $tutor_id = $tzips->{tutorID}; my $dist = calculate_distance( $clong, $clat, $tzips->{Longitude}, $tzips->{Latitude} ); if ($last_tutor_id ne $tutor_id) { $last_tutor_id = $tutor_id; push @tutors, $tzips; } else { next if $last_dist <= $dist; $tutors[-1] = $tzips; } $tzips->{Dist} = $last_dist = $dist; } my @report_fields = qw( tzipID ID Dist City State Zipcode ); for my $tutor ( sort{ $a->{Dist} <=> $b{Dist} } @tutors) ) { print(join(', ', @{$tutor}{@report_fields}), "\n"); } [download] In the OP case, where the full data has to be keep in memory for the final sorting, this approach may be just more complicated than ikegami's one and also imposes an extra load in the database. But when you can output the data as you go and don't need to keep it in memory for a final processing stage, it may perform better, specially for large data sets, as it's memory usage is fixed O(1), not dependant on the data set size O(N).	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: sorting an array of hashes and removing duplicates
by salva (Canon) on Apr 02, 2010 at 07:36 UTC

my $sql = qq|
    SELECT ID, tutorID, Latitude, Longitude, City, State, Zipcode
      FROM $table_tzips
     WHERE Zipcode IS not NULL
     ORDER BY tutorID
|;

my $last_tutor_id = "";
my $last_dist;
my @tutors;
my $sth = $dbh->prepare($sql);
$sth->execute();
while (my $tzips = $sth->fetchrow_hashref()) {
    my $tutor_id = $tzips->{tutorID};
    my $dist = calculate_distance(
        $clong, $clat,
        $tzips->{Longitude}, $tzips->{Latitude} );

    if ($last_tutor_id ne $tutor_id) {
        $last_tutor_id = $tutor_id;
        push @tutors, $tzips;
    }
    else {
        next if $last_dist <= $dist;
        $tutors[-1] = $tzips;
    }
    $tzips->{Dist} = $last_dist = $dist;
}
my @report_fields = qw( tzipID ID Dist City State Zipcode );

for my $tutor ( sort{ $a->{Dist} <=> $b{Dist} } @tutors) ) {
    print(join(', ', @{$tutor}{@report_fields}), "\n");
}
[download]

ikegami

But when you can output the data as you go and don't need to keep it in memory for a final processing stage, it may perform better, specially for large data sets, as it's memory usage is fixed O(1), not dependant on the data set size O(N).

[reply]
[d/l]