sgifford has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a Google Maps project to display a wide variety of homes throughout Michigan. My perl script needs to GeoCode these addresses and put them in a database. I've tried http://geocoder.us, but it's not able to find about 2/3 of the addresses I send to it. I've also tried some demos of commercial services, and while they can locate all of my addresses, they're a hassle to get working in Perl, and expensive to boot. Does anybody know of another GeoCoding service that's easy to use from Perl, and is also as cheap as possible?

Updated: Retitled per consideration suggestion.

  • Comment on Looking for a cheap, Perl-friendly GeoCoding service

Replies are listed 'Best First'.
Re: Looking for a cheap, Perl-friendly GeoCoding service
by hossman (Prior) on Jul 08, 2005 at 07:06 UTC

    First of all: if you haven't seen Geo::Coder::US (and Geo::Coder::US::Import) you should take a look at them. With a moderate amount of initial setup, you can do all of your geocoding locally, which may be helpfull if you need to deal with a large volume of addresses, or try lots of varients of each address.

    Second: Try lots of varients of each address. For example: http://geocoder.us/ does very poorly with the address "235 Second Street, 94105", but it does a great job with "235 2nd Street, 94105". Parsing the address into it's components (using Geo::StreetAddress::US) and then trying to "tweak" the various components untill you get a match may be the way to go.

    Lastly: As you may or may not know, http://geocoder.us/ is powered by Geo::Coder::US and Geo::StreetAddress::US. Those modules are both documented to have issues with addresses in Michigan which contain letters in the buildeing number. If you've got some time/energy/need - you might try patching them to fix that bug.

    UPDATE: Acctaully, I just realized that Geo::Coder::US->filter_ranges might be exactly what you need .. I haven't tried it myself.

      Thanks, hossman, for an excellent reply. Any advice on automatically tweaking the address components? I expect to be GeoCoding quite a few addresses every day, and it would be too time-consuming to tweak every address by hand, but if a few heuristics can do the job, that would be great.

      Also, the issue here doesn't seem to be the Michigan bug; the addresses with errors are in suburban areas and don't have letters in their building numbers. Still, nice catch!

        I really haven't played with Geocoding beyond the manual tweaking stage. The only two things that have really jumped out at me are numeric street names (which mapping hash might help with) and dropping the street extension (ie: sometimes people say "Haste Ave", but it's really Haste Street" ... if you drop the "Ave" and just look for "Haste" it might figure out what you want.

        I imagine that if i was really doing a lot of geocoding in bulk, i would run some tests to feed huge number of addresses in, and generate two logs: addresses that can't be parsed, and addresses that can be parsed but not located. then i would manually review the lists (seperately) and try to find patterns, then pick a few examples, and see if an easy rule fixes those examples -- if so, try it on all addresses that match the pattern, and add thta rule to your code base.

        Note also the update to my orriginal reply ... take a good hard look at that method. It says it will apply the individual pieces of the address of the address one at a time, and skip any that will result in no matches -- which may be helpfull in figuring out what the test is that causes your problem addresses to fail. (you might have to add some logging to it, or pull outhte logic into your own method -- but it might give you a good starting point)

        Lastly: if you've got a really large list of addresses that it can't find, contact missing {at} geocoder.us. it says right on their site that they are interested in hereing about legitimate addresses that can't be found, maybe they can spot hte pattern and provide a fix for Geo::Coder::US