The Google Maps API provides Geocoding functionality. Given some input (which can be unstructured text) it will return as many possible geographic locations as it can find along with an accuracy (ie: confidence) rating for each.
Some manual analysis of the results from "real world" sample input from your search should help you find a sweetspot where you can ignore any response from the geocode service that contains too many locations, or locations with too low of an accuracy.
| [reply] |
| [reply] |
States are easier since you have a well-defined list (50 names + 50 postal abbreviations + a handful of alternate abbreviations (Mass, Miss, ...)). For cities or anything smaller, the only thing I can think of is catching capitalized words (proper nouns). You then need some lexical flag to differentiate locations from people's names - perhaps prepositions like on, in, at?
Update: On further consideration, this is definitely a job for AI. I note a number of possibilities with search terms like AI, Bayes, and neural net, though a lot of it is labeled alpha. | [reply] |
If you miss the pattern, you won't be able to tell the computer what you want. Unless it's an AI application, which has an idea how to construct the semantic parser.
What is the concrete question? | [reply] |
Are there any specific formats involved—e.g., "Chicago, IL", "Chicago, Illinois"—or is this open-ended—e.g., "chicagoland", "Ill.", etc.? | [reply] |
because these are web queries, that is part of the problem.
| [reply] |