I have worked with cleaning up municipal public record data in the US (wisconsin) for a number of years, and parsing addresses is an incredibly complex subject. So as the previous post mentions, you need to be more specific.
I guess you have to determine what you need out of the addresses. Are you looking to CASS certify them, or do some other sort of checking? There are many possible goals.
If you're not sure, and you're dealing with addresses in the US, you might want to check out www.usps.gov which contains extensive documentation on US postal standards, which will help you with what is valid address information.
As a general parsing technique, if you split an address line on whitespace, any non alphanumeric characters, and any number-letter (or letter-number) boundary, it is much easier to determine what parts represent street numbers, directions, street types, units, etc.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.