Harch84 has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys really hoping someone can advise me with this. I have a Perl script that connect to a PostgreSQL database and retrieves fields and rows from a database of bus timetables and bus stop information. The data I am extracting is the reference number of the bus stop, the route number this bus stop serves and the distance of a point to this bus stop. The select query returns rows like this in PostgreSQL:
stop_reference | service_id | distance ----------------+------------+------------------ 6200200581 | 1 | 131 6200249240 | 2 | 148.32734070292 6200249230 | 2 | 247.695377429616 6200249250 | 2 | 394.507287638644 6200249220 | 2 | 394.81641303269
Now what I want to be able to do is to loop through each of the rows from this query in Perl, and to find where the service_id is the same. And make an IF statement that says when they are the same only return the row with the shortest distance. This is the part of the Perl script that brings my query into Perl:
$originsql = qq{ SELECT a.stop_reference, b.service_id, distance....}; $sth = $dbh->prepare( $originsql ); $sth->execute(); $sth->bind_columns( undef, \$stops, \$service, \$distance ); while( $sth->fetch() ) {
Im hoping someone can help me to do this? If you need anymore information please let me know. Thanks

Replies are listed 'Best First'.
Re: Help with manipulating data from a database??
by ForgotPasswordAgain (Vicar) on Jul 02, 2007 at 14:17 UTC

    If I understand correctly, you want the minimum distance (along with the stop_reference) for each service_id. If you didn't need the stop_reference, you could just do a GROUP BY:

    -- not what you want SELECT service_id, min(distance) FROM table GROUP BY service_id;

    But then you lose the stop_reference information, because you've "grouped" that away.

    So what I'd probably do is use ORDER BY:

    SELECT stop_reference, service_id, distance FROM table ORDER BY service_id, distance;

    Then you just print out (or store) every time the service_id changes:

    my $last_service_id = -8675309; # for a good time, call while ($sth->fetch()) { if ($service != $last_service_id) { print "min dist for service '$service' = '$distance' at stop ' +$stops'\n"; $last_service_id = $service; } }
      Ummm... You know you can GROUP BY multiple fields, right?
      SELECT stop_reference, service_id, min(distance) FROM table GROUP BY stop_reference, service_id ;
      The basic rule is that you can include as many fields in a summary query as you like, provided that each one of them has either a summary operation (such as min()) or a GROUP BY applied to it. (This does rule out anything like SELECT * without going to a subquery, but, then, I only use SELECT * when I'm querying by hand anyhow. Using it in my code carries too much risk of breakage if the table's columns get reordered.)
        I thought about the double GROUP BY, but I think that'd only work if stop_reference is an ID-like column, with distinct values, right? I was assuming that since stop_reference is referring to bus stops, that the same stop_reference might occur more than once (for example where two or more bus lines intersect). In that case, I think the GROUP BY would no longer work because you'd "group away" some extra rows, right?
      That is exactly the answer I was looking for thanks so much to everyone for their advice and help and especially to ForgotPasswordAgain for his answer :-) You guys really are Perl legends!! Some day ill hope to repay you with some knowledge of my own!
Re: Help with manipulating data from a database??
by bobf (Monsignor) on Jul 02, 2007 at 13:55 UTC

    An SQL guru could probably show you how to do this in the query statement (perhaps using a MIN() function), rather than retrieving a lot of data that you really aren't interested in and then filtering through it, but I don't have enough mojo to whip it off without testing and I don't have time to test right now. :-)

    That said, you could declare a hash (my %bus_stops) before the while loop and then do something like this in the loop itself:

    next if( $bus_stops{$service}{dist} < $distance ); $bus_stops{$service}{dist} = $distance; $bus_stops{$service}{stop_ref} = $stops;

    You may also find the fetchall_arrayref() and fetchall_hashref() methods useful. They allow you to retrieve all of the results at once (rather than looping over a fetch statement), but the practicality of doing that depends on how many results you plan to retrieve. :-)

Re: Help with manipulating data from a database??
by archfool (Monk) on Jul 02, 2007 at 13:55 UTC
    Two SQL queries.
    SELECT DISTINT service_ID from ..... Fetch into array and loop... foreach (serviceID) { SELECT stop_reference, distance from XXX WHERE service_ID = :p1 AND distance = (SELECT min(distance) from XXX where service_id = : +p1) }
    I'm sure some SQL guru can wrap this all into one SQL query. In Oracle, you could use PL/SQL to get this, too. But even in pure SQL you should be able to do this without a loop. I'm just not an SQL guru. :(
Re: Help with manipulating data from a database??
by mikeB (Friar) on Jul 02, 2007 at 15:30 UTC
    You could use ForgotPasswordAgain's initial "won't work" idea as a subquery to grab the full record for the shortest distances.
    select a.* from table_name a join ( select min(distance) as min_dist, service_id as min_id from table_name group by service_id ) b on (a.distance = min_dist and a.service_id = min_id);
    If (service_id, distance) is not a unique key, this will return all the stop_references with the minimal distance. This may be a Good Thing, depending on the application. If you want only one result, adding the appropriate ORDER BY and LIMIT 1 clauses should do the trick.
    /is not a guru //tried it and it worked for me
Re: Help with manipulating data from a database??
by swampyankee (Parson) on Jul 02, 2007 at 16:05 UTC

    Please show the full query. It may be possible to modify the query, itself, to do what you ask.

    If modifying the query to do all the work isn't possible, and presuming service_id is the route number, I would loop through the data, filling an array indexed by service_id. When the distance is less than the existing distance, replace it with the distance just read. Filling in your while loop with some guaranteed-to-look-like-code pseudo-code:

    my @stop_list; my @distance; while($sth -> fetch()){ if((not defined($distance[$service]) or $distance[$service] > $distance) { $stop_list[$service] = $stops; $distance[$service] = $distance; } } foreach $service (@stop_list) { print "Stop: $stop_list[$service] distance: $distance[$service]\n" }
    Note that this is absolutely untested; it's pseudo-code.

    This will set $stop_list[$service] to the appropriate stop number and $distance[$service] to the appropriate $stops and $distance, respectively, if the current value of $distance[$service] is undefined or greater than the distance just returned by the fetch.

    corrected markup

    emc

    Any New York City or Connecticut area jobs? I'm currently unemployed.

    There are some enterprises in which a careful disorderliness is the true method.

    —Herman Melville
Re: Help with manipulating data from a database??
by johngg (Canon) on Jul 03, 2007 at 13:37 UTC
    I would think that doing as much of the work as possible in the SQL query, as suggested by other Monks, makes most sense. I have not done enough SQL to usefully contribute there. Looking at the documentation for DBI I saw the fetchall_arrayref method. If going the simple query route, you could use it to populate a hash keyed by service_id with the results of the query sorted by descending distance. Thus the shortest distance row for each service would be assigned to the hash last, overwriting any longer distances. I don't have DBI available (or a database for that matter) so have made up a pretend routine that returns a ref. to an AoA and I added a little more pretend data.

    use strict; use warnings; use Data::Dumper; sub pretendFetch { my $cannedFetch = [ [6200249230, 2, 247.695377429616], [6200200581, 1, 131], [6200249220, 2, 394.81641303269], [6200249240, 2, 148.32734070292], [6200249269, 1, 127.84583439484], [6200249250, 2, 394.507287638644], [6200248437, 1, 234.38349845484], ]; return $cannedFetch; } my %service_id = map { $_->[1] => { stop_reference => $_->[0], distance => $_->[2] } } sort { $b->[2] <=> $a->[2] } @{ pretendFetch() }; print Data::Dumper->Dump([\%service_id], [q{*service_id}]);

    Which produces

    %service_id = ( '1' => { 'distance' => '127.84583439484', 'stop_reference' => '6200249269' }, '2' => { 'distance' => '148.32734070292', 'stop_reference' => '6200249240' } );

    I hope this is of interest.

    Cheers,

    JohnGG