Jerry has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks, I have been asked to code a search engine in which the user supplies their zipcode and is presented with a list of product distributors within their area. At this point, I'm merely numerically comparing the user's zipcode to each one in the database and determining whether to return or discard each record. Is there a better way to do this? While in theory postal codes which are numerically alike should be close together, but there are exceptions. Any advice would be appreciated.

-Jerry
http://www.digilliance.net

Replies are listed 'Best First'.
Re: Zipcode search engine
by Jazz (Curate) on Sep 10, 2001 at 04:36 UTC
    Update: Code updated to do reverse lookups. Now can do either city/state to zips and zip to city/state.

    True, in theory, postal codes fairly closely relate to proximal cities, but where to start and stop the range?

    For example, New York City, NY, a comparatively small city (in square miles) has 162 zip codes ranging from 10001 -> 10292. Meaning, within that range there are 129 unused zip code numbers.

    Fayetteville, AR, much larger in area (950 sq miles) than NYC (309 sq miles), has only 4 zip codes 72701 - 72704. After NYC's 10292 zip, the next used zip is 10301, Staten Island. A short ferry ride away (or a bridge, if you're in Brooklyn), but a different island altogether and not the most convenient "closest store" from Manhattan.

    So the question is what range should be associated with what region? Since the answer is difficult (if not impossible) to guess, let's defer to the USPS.

    The following code will take a city and state and return all other zip codes for that city. It may not offer as broad a range as you'd like (nearby cities?), but at least you know that you're really pointing them to a location that's "nearby".

    It also will let you check to see if a zip code that you're "guessing" is actually in the same area by returning the associated city/state (hint -- offer the cities and let the user select which is closest and/or most convenient to them).

    #!/usr/bin/perl -w use CGI qw / :standard /; use LWP::UserAgent; use strict; print header(); # use z to enter a zip and get the city and state # use cs to enter the city, state and get all available zip codes for +the area my $query = 'z'; my $city = 'New York'; my $state = 'NY'; my $zip = '10001'; my $usps_form = 'http://www.usps.gov/cgi-bin/zip4/ctystzip2'; # prepare agent my $ua = LWP::UserAgent->new(); $ua->agent( 'AgentName/0.1 ' . $ua->agent ); # prepare request my $req = HTTP::Request->new( POST => $usps_form ); $req->content_type( 'application/x-www-form-urlencoded' ); $req->content( "ctystzip=$city $state&Submit=Process" ) if $query eq ' +cs'; $req->content( "ctystzip=$zip&Submit=Process" ) if $query eq 'z'; # process request my $res = $ua->request( $req ); # process result if ( $res->is_success ) { # city/state to zip query if ( $query eq 'cs'){ my @zips = zip_from_city_state( $res->content ); my $count = @zips; print p( "$count zip codes for $city, $state" ), blockquote ( join ( ", ", @zips ) ); } # zip to city/state query elsif ( $query eq 'z' ){ my $cs_aref = city_state_from_zip( $res->content ); print p( "Cities listed for $zip:" ), '<BLOCKQUOTE>'; print "$_->{'c'}, $_->{'s'}", br() foreach @$cs_aref; print '</BLOCKQUOTE>'; } else { die 'Bad query.'; } } else { warn 'USPS down, bad request, or no results.'; } sub zip_from_city_state { my @zips; # there has to be a more efficient way to do this :) foreach ( split( /<BR>/i, ( split( m|</?PRE>|i, $_[0] ) )[1] ) ){ my $zip = substr( $_, 0, 5 ); push @zips, $zip if $zip =~ /^\d{5,5}/; } return @zips; } sub city_state_from_zip { my ( $res, $cs_aref ) = ( shift, () ); $res =~ s/<BR>/\n/gi; # there has to be a more efficient way to do this :) my @lines = split( "\n", ( split( /--+/, ( split( m|</?PRE>|i, $res ) )[1] + ) )[1] ); for my $line ( @lines ){ my ( $city, $state ) = unpack( 'A27A2', $line ); push @$cs_aref, { 'c' => $city, 's' => $state } if $city and $ +state; } return $cs_aref; }

    The above should be pretty self-explanatory, but if you have any questions, just holler.

    Any optimizations or suggestions (both desired) would be most welcome :)

    Jasmine

Re: Zipcode search engine
by shotgunefx (Parson) on Sep 10, 2001 at 02:53 UTC
    You can buy zipcode data pretty cheap (around $250 for a liscense) that contains the Latitude and Longitude of the center of each US Zip. Most of the companies that sell this, provide the formulas to calculate distance between two zip codes using this info.

    I used it for a project before but I don't have the name of the company handy. I'll look for it and post it if I find the company.

    UPDATE
    The zipcode db I was speaking of is available at zipinfo.com. They provide several ways of calculating distance with various degrees of precision.

    -Lee

    "To be civilized is to deny one's nature."
Re: Zipcode search engine
by tachyon (Chancellor) on Sep 10, 2001 at 03:42 UTC

    At this point, I'm merely numerically comparing the user's zipcode to each one in the database and determining whether to return or discard each record. Is there a better way to do this?

    Probably. This sounds like you are using a flat file 'data base' and simply iterating from the beginning of this to the end. For a small file this works fine but breaks down when you get bigger and or demand on access to this file becomes high. On way to arrange this in a proper database would be to have two tables:

    Zip Table Zip Code,Distrubitor Code/s 71000, 1|2|3 72000, 3|4 .... Distributor Table Dist ID, Lotsa other Data :-) 1, She'll be right INC 2, No Worries Mate INC 3, Can Do Corp 4, The Impossible Done Yesterday P/L

    You do the lookup on the Zip table (really basic SQL) to get the Dist ID(s) for that area (Pipe delimited here). You then grab all those Distributors from the Dist table. Presumbaly the Distributor ID table should already exist? So you just need to generate a Zip table that cross references the Zip code to the Dist ID. You could do this manually using a map or automate it using the zip code geographic position data mentioned elsewhere. You would probable need to get geographic co-ordinates for you distributors but this is pretty easy. You can get the longs and lats for any US city from many of the online mapping services free. See jcwren monk map page (on the stats pages) for one such link.

    You could implement this using DBI and DBD::CSV making it easy to move to a real database in the future.

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      A possible problem with tachyon's code is that it doesn't work unless you happen to be in a zip code with distributors

      If you call call the UPSPS National Customer Support Center at (800) 238-3150 you can talk to the them about ordering zip/latitude/longitude data

      The documentation for this data is available as a pdf on the usps site.

      It seems like most people find a vendor to give them this data.The USPS FAQ talks about it a bit.

      I'm guessing that the 'ol a**2 + b**2 =c**2 would be how you would compute the shortest distance between two arbitrary zip codes. No doubt the monk map uses nicer code...

      use strict; use warnings; my $x1=10; my $y1=20; my $x2=-8; my $y2=2; my $dist=sqrt(($x1-$x2)**2+($y1-$y2)**2); print "$dist =distance";
      Of course you have to (pre?) compute the distance between the user's zip code and many of the other zip codes in your database...



      --mandog

      wouldn't it be preferable to adhere to normalization rules and have the zipcode database in the following form?

      zip distributor
      71000 0001
      71000 0002
      71000 1953
      71001 1113
      71004 0006

Re: Zipcode search engine
by George_Sherston (Vicar) on Sep 10, 2001 at 01:15 UTC
    Is it a processing-speed issue? If so I wd have thought you could do something by adding to your db structure. Perhaps another table that had one record per zip code, with that record being the IDs of all the distributors in that zip code. Then you've got two v quick lookups rather than one slowish one. I know this isn't a perl solution! But it sounds more like a db problem (I may have misunderstood - stranger things have happened), and there IS more than one way to do it.

    § George Sherston