I really haven't played with Geocoding beyond the manual tweaking stage. The only two things that have really jumped out at me are numeric street names (which mapping hash might help with) and dropping the street extension (ie: sometimes people say "Haste Ave", but it's really Haste Street" ... if you drop the "Ave" and just look for "Haste" it might figure out what you want.
I imagine that if i was really doing a lot of geocoding in bulk, i would run some tests to feed huge number of addresses in, and generate two logs: addresses that can't be parsed, and addresses that can be parsed but not located. then i would manually review the lists (seperately) and try to find patterns, then pick a few examples, and see if an easy rule fixes those examples -- if so, try it on all addresses that match the pattern, and add thta rule to your code base.
Note also the update to my orriginal reply ... take a good hard look at that method. It says it will apply the individual pieces of the address of the address one at a time, and skip any that will result in no matches -- which may be helpfull in figuring out what the test is that causes your problem addresses to fail. (you might have to add some logging to it, or pull outhte logic into your own method -- but it might give you a good starting point)
Lastly: if you've got a really large list of addresses that it can't find, contact missing {at} geocoder.us. it says right on their site that they are interested in hereing about legitimate addresses that can't be found, maybe they can spot hte pattern and provide a fix for Geo::Coder::US
|