furry_marmot has asked for the wisdom of the Perl Monks concerning the following question:
Update: And that's why I hang out here. :-) Thanks to GrandFather and ikegami for pointing out the trees in my forest. After the street number, an address it may contain a unit/apt number, direction (SW/NE/etc), and multi-part street name separate from the street suffix (Blvd/Drive/etc). I have an address parsing function I use for another application; but the code below processes potentially several thousand addresses to aggregate market information and benchmarking showed parser was much slower here, where I'm just prettying up addresses for reports.
Somewhere along the line, I lost sight of the fact that the street suffix, if there is one, will always be at the end, obviating any need for context -- as it should be.
Hello Wise and Noble Monks.
I'm using map to break apart addresses from a variety of sources, standardize casing, abbreviating, and numbering, and then stitch them back together, like so:
# Rewritten $prop->{'address'} = # reassemble the address join ' ', map { s/^(#.+)/\U$1/; # for units, such +as #A or #215-C s/^(#?)0+([1-9]\S+)/$1$2/; # correct numberin +g, like 0000048th -> 48th s/^(mc|o')(.+$)/\u\L$1\E\u\L$2/i; # correct casing o +f O'Brien, McDonald, etc. s/\.+$//g; # remove literal d +ots at the end of elements. $_ # required at end +of map when used like this } map { ucfirst lc $_ } # proper case word +s, simple split ' ', $prop->{'address'}; # split on spaces # Normalize street suffixes to standard postal abbreviations # This is easier than creating a temp array. $prop->{'address'} =~ s/ (\w+)$ / if (defined $street_suf_lkup{lc $1}) +{ $street_suf_lkup{lc $1} } else { $1 } /ex;
This has worked fine for a few years. However, the other day, I came across the address 123 Circle Way. My program obediently changed it to 123 Cir Way, which then choked up the works further down the line.
My question is whether I can tell if there are any more elements coming through the map pipe, as it were. In other words, if Circle were the last element of the split address, it should be abbreviated, but if it's any other element (as in Circle Way), it should be left alone.
I fully realize there are other ways to do this by rewriting the code, but I like using map this way.
Any thoughts on this?
Thanks. Marmot.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Counting elements being mapped via map{}
by GrandFather (Saint) on Sep 22, 2009 at 01:20 UTC | |
|
Re: Counting elements being mapped via map{}
by ELISHEVA (Prior) on Sep 22, 2009 at 01:40 UTC | |
|
Re: Counting elements being mapped via map{}
by ikegami (Patriarch) on Sep 22, 2009 at 01:21 UTC |