in reply to International Addresses

I think that's quite an ambitious project.
Validating can be a huge pain mainly in acquiring the appropriate data (zipcodes,postal codes, etc) for all places in the world.
I think you will end up validating it like an email. (It's valid in form but no way to know until you send it!)

How will you be getting this information? You might be better coming up with the elements of addresses and working those elements into "definitions". I don't know if address requirements vary within different regions of any countries, but working backwards like this, you could 1. use it to validate information as being valid in form and 2. use these definitions in other programs to acquire the appropriate data.
You may or may not find the following modules helpful
Data::Address::Standardize
Scrape::USPS::ZipLookup
ISO 3166 Country Codes

Good luck. I'd be interested in seeing what you come up with.

-Lee

"To be civilized is to deny one's nature."

Replies are listed 'Best First'.
Re: Re: International Addresses
by krazken (Scribe) on Mar 06, 2002 at 19:22 UTC
    Well I am starting with Canada. I am creating lookup structures that have stuff like valid words in a french address, valid words in an english address, what the valid abbreviations for the street type are (e.g. street=st abbey=abbey etc.) Once I get all of that, the majority of this will come from information compiled of the web just by searching for postal information on each individual country. Very manual I know. Oh yeah...to throw another wrench into it, I have to be able to handle multiple character sets as well, so unicode is a must.
      Are you trying to parse this out of freeform text or do you know what data is what.

      As an aside, I posted a module here for looking up ISO Country codes. Don't know if you will find it useful or not. I have one for US states to that I will post later.

      -Lee

      "To be civilized is to deny one's nature."
      I am not sure what you mean by "valid words"? Would this mean that an address must contain one of the special words to be ok, such as "street" or "st", "lane", "road" or something in the address field?

      I'd say that is an impossible task, unless some countries actually have such strict policies for what is a valid address. Speaking for Sweden, for one thing, we have lots of addresses containing "gata", which means "street" for instance, but lots of addresses don't - and some addresses are just the name of a village, or something smaller than that, with or without a number after it. Yet other addresses are something that would translate to "Mailbox XXX", which is not the same thing as a PO box (we have those too), etc... frankly, I can't see any other match to our addresses than /.+/.

      Either I misunderstood what you mean, or I think it will be impossible to create these rules - unless you would do as some e-commerce do, check addresses against where people live according to central government registers. And that was clearly not your goal... :)


      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a grue.
        You are correct with the whole valid words. Hopefully it is documented per country on what the valid types of thoroughfares are. I can't do a check against the central government registers because the majority of the world's governments don't let their postal files out of the country. Some because they don't want to, others because they have no idea where everyone lives. What is the difference between Mailbox and Box (which is what I have that a PO Box is called in Sweden) is this right?