jrmtreebeard has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to add a website feature that calculates the distance between two US ZIP codes, then sorts search results by distance. Until today, I had good luck using the Geo::Google module to get the distance--just let Google do the work for me.

Geo::Google is super-easy to use, but I need something that isn't likely to break with every Google Maps API change.

Has anyone dealt with the 'zip code distance' problem recently? If I have to buy the data from USPS, I will, but it'd be far nicer to query a web service and store the results. Geocoder.us (Geo::Coder::US module) doesn't accept ZIP codes by themselves, or I'd happily use that.

--John

Replies are listed 'Best First'.
Re: ZIP code distances
by Cristoforo (Curate) on Aug 24, 2007 at 01:48 UTC
    I forget where I got my zip code file, but a search should find you a downloadable one. I used the Geo::Ellipsoid module. I was really only playing around so I can't say the code does much but maybe it will give you an idea. The program figures the distance between 2 zip codes

    Chris

    Update (major): I posted a bad binary search sub. :-(
    Eliminated the code. I thought it was ok but when I tested it against every zip code in the file, about 1/4 came back as unfound.

    Update2: As long as the file is read into memory anyhow, could have used a hash (no need for a binary search).

    #!/usr/bin/perl use strict; use warnings; use Geo::Ellipsoid; open IN, "zip_codes.csv" or die $!; my @codes = <IN>; close IN or die $!; my ($zip1, $zip2) = @ARGV; my @zip1 = split /,/, $codes[ number_bin_search($zip1, \@codes) ]; my @zip2 = split /,/, $codes[ number_bin_search($zip2, \@codes) ]; my $geo = Geo::Ellipsoid->new(units=>'degrees'); my $d = .000621 * $geo->range(@zip1[1,2], @zip2[1,2]); print "MILES: $d - @zip1[3,4] - @zip2[3,4]\n"; __END__ (small contents of 42,741 codes in zip_codes.csv) 43787,+39.548994,-081.826194,STOCKPORT,OH,MORGAN,STANDARD 43788,+39.795107,-081.370927,SUMMERFIELD,OH,NOBLE,STANDARD 43789,+39.654386,-081.240732,SYCAMORE VALLEY,OH,MONROE,PO BOX ONLY 43791,+39.871330,-082.098668,WHITE COTTAGE,OH,MUSKINGUM,PO BOX ONLY 43793,+39.751516,-081.075921,WOODSFIELD,OH,MONROE,STANDARD 43802,+40.090767,-081.855203,ADAMSVILLE,OH,MUSKINGUM,STANDARD 43803,+40.357237,-081.643638,BAKERSVILLE,OH,COSHOCTON,PO BOX ONLY 43804,+40.426559,-081.674440,BALTIC,OH,TUSCARAWAS,STANDARD 43805,+40.398274,-081.968787,BLISSFIELD,OH,COSHOCTON,PO BOX ONLY 43811,+40.247685,-081.929225,CONESVILLE,OH,COSHOCTON,STANDARD 43812,+40.300934,-081.864066,COSHOCTON,OH,COSHOCTON,STANDARD 43821,+40.106916,-081.999822,DRESDEN,OH,MUSKINGUM,STANDARD 43822,+40.108668,-082.103212,FRAZEYSBURG,OH,MUSKINGUM,STANDARD 43824,+40.364667,-081.755507,FRESNO,OH,COSHOCTON,STANDARD 43828,+40.351271,-081.873607,KEENE,OH,COSHOCTON,PO BOX ONLY 43830,+40.063883,-082.099574,NASHPORT,OH,MUSKINGUM,STANDARD
Re: ZIP code distances
by BrowserUk (Patriarch) on Aug 24, 2007 at 04:07 UTC

    When I was exploring my ideas about Speeding up point-in-polygon -- take two I needed some sample data and I cast around and found this US government site that has zip files (sic) containing polygons for all the US 5-digit zip codes, by state.

    They are in a complex format, .e00, but there is a Geo::E00 for reading them. The polygons are defined in terms of longitude and latitude with appropriate corrections etc. Although you don't need the polygons per se, each polygon represents a single zip code and each also has a centroid which you could extract (the module has a method for doing this directly), and use it to represent the entire zipcode.

    Should be reasonably accurate data given the source, and best of all, it's legitimately free.

    Just a word of caution I came across when looking before Zip_Codes_are_Not_Areas.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: ZIP code distances
by swampyankee (Parson) on Aug 24, 2007 at 02:56 UTC

    I've dealt with this specific issue. I'm sure that you've already found out that ZIP codes are messy beasts (there are actually places where a group of east-west streets will have different ZIPs than the north-south streets that intersect them). You may find Geo::PostalCode will suffice; I've not used it (I used ZIP code data provided by a company called Claritas; my then-employer was a market research firm, and I wrote the routines to deal with it in Fortran). One fairly significant issue is that not all ZIP codes are geographic entities. For example, some large companies have their own, as do some universities. The ZIP code data I dealt with had about 35,000 "real" ZIP codes, and a few thousand ZIP codes which were large businesses, college campuses, etc.

    In any case, I'd store the ZIP and their corresponding latitude and longitude in a database, and only go off to Web when a ZIP can't be found.


    minor editorial correction


    emc

    Information about American English usage here and here.

    Any New York City or Connecticut area jobs? I'm currently unemployed.

      Geo::PostalCode works really well. However, the data source it suggests (1999 US Census data) is worthless here in Northern MA. Natick and Woburn (20 miles apart) have the same lat/long coordinates!
        However, the data source it suggests (1999 US Census data) is worthless here in Northern MA. Natick and Woburn (20 miles apart) have the same lat/long coordinates!

        Not only obsolete, but wrong from its inception. The USPS and US Census Bureau have pointers to what is probably the closest thing to an "official" list of ZIP code locations. One trouble with any static source is that ZIP codes get added fairly often, so any static data source is quickly obsolete. With the Census Bureau's TIGER data, there is also the problem that there are ZIP codes the Census Bureau doesn't care about, as they are non-residential, e.g. some businesses may have their own ZIP codes.


        emc

        Information about American English usage here and here.

        Any New York City or Connecticut area jobs? I'm currently unemployed.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: ZIP code distances
by planetscape (Chancellor) on Aug 24, 2007 at 03:16 UTC
Re: ZIP code distances
by moklevat (Priest) on Aug 24, 2007 at 01:42 UTC
Re: ZIP code distances
by davebaker (Pilgrim) on Aug 24, 2007 at 03:29 UTC
Re: ZIP code distances
by Popcorn Dave (Abbot) on Aug 24, 2007 at 01:51 UTC
    One other option may be a combination of MapQuest and WWW::Mechanize. I know I've used MapQuest to get a generic distance between two cities. Obviously you'd have to check their TOS first.

    HTH!


    Revolution. Today, 3 O'Clock. Meet behind the monkey bars.

    I would love to change the world, but they won't give me the source code

Re: ZIP code distances
by jrmtreebeard (Novice) on Aug 24, 2007 at 22:02 UTC

    Thanks to everyone for their advice. It definitely makes sense to have an existing database of distances (or latitudes and longitudes) and check it first, before venturing out onto the web.

    Moklevat's suggestion of http://sourceforge.net/projects/zips/ gives me a pre-packaged CSV file, based on the 2000 US Census data, of zip codes + latitudes and longitudes. Assuming that the data is accurate enough (I'll use Google to do some comparisons), it's a good basis for my work (http://mvhub.com/).

    FYI, the US Postal Service charges $50/state and $700/all ZIP codes for this data (TIGER/ZIP+4). The woman I spoke with said that the data was updated every two or three years, which isn't that much better than using the Census data.

    As for Geo::Google, one of its developers, Michael Trowbridge, responded to my bug report within half an hour, offering a patch. Turns out that Google's responses switched from UTF-8 character encoding to ASCII.

    --John