in reply to Re^2: Looking for a cheap, Perl-friendly GeoCoding service
in thread Looking for a cheap, Perl-friendly GeoCoding service

I really haven't played with Geocoding beyond the manual tweaking stage. The only two things that have really jumped out at me are numeric street names (which mapping hash might help with) and dropping the street extension (ie: sometimes people say "Haste Ave", but it's really Haste Street" ... if you drop the "Ave" and just look for "Haste" it might figure out what you want.

I imagine that if i was really doing a lot of geocoding in bulk, i would run some tests to feed huge number of addresses in, and generate two logs: addresses that can't be parsed, and addresses that can be parsed but not located. then i would manually review the lists (seperately) and try to find patterns, then pick a few examples, and see if an easy rule fixes those examples -- if so, try it on all addresses that match the pattern, and add thta rule to your code base.

Note also the update to my orriginal reply ... take a good hard look at that method. It says it will apply the individual pieces of the address of the address one at a time, and skip any that will result in no matches -- which may be helpfull in figuring out what the test is that causes your problem addresses to fail. (you might have to add some logging to it, or pull outhte logic into your own method -- but it might give you a good starting point)

Lastly: if you've got a really large list of addresses that it can't find, contact missing {at} geocoder.us. it says right on their site that they are interested in hereing about legitimate addresses that can't be found, maybe they can spot hte pattern and provide a fix for Geo::Coder::US

  • Comment on Re^3: Looking for a cheap, Perl-friendly GeoCoding service

Replies are listed 'Best First'.
Re^4: Looking for a cheap, Perl-friendly GeoCoding service
by sgifford (Prior) on Jul 09, 2005 at 09:09 UTC
    Thanks again. Here's what I did that helped.

    First, most of the problems ended up being city names like Flint Twp instead of just Flint. Normalizing with Geo::StreetAddress::US and removing city suffixes like Twp and Vlg got me down from 50 to about 15 failed addresses. A few more were fixed by changing mt to mount in city names, and that got me down to 10. 5 of those worked with other, more costly services.

    What I'll probably do is write a Geo::Coder::BargainHunter module to try http://geocoder.us first, then http://geocode.com or similar for the trickier addresses.

    The filter_ranges method looks useful, but it can't be called remotely over SOAP, and I don't have a local copy of the database (though I may go ahead and get one if we go live with this).