Update: And that's why I hang out here. :-) Thanks to GrandFather and ikegami for pointing out the trees in my forest. After the street number, an address it may contain a unit/apt number, direction (SW/NE/etc), and multi-part street name separate from the street suffix (Blvd/Drive/etc). I have an address parsing function I use for another application; but the code below processes potentially several thousand addresses to aggregate market information and benchmarking showed parser was much slower here, where I'm just prettying up addresses for reports.

Somewhere along the line, I lost sight of the fact that the street suffix, if there is one, will always be at the end, obviating any need for context -- as it should be.

Hello Wise and Noble Monks.

I'm using map to break apart addresses from a variety of sources, standardize casing, abbreviating, and numbering, and then stitch them back together, like so:

# Rewritten $prop->{'address'} = # reassemble the address join ' ', map { s/^(#.+)/\U$1/; # for units, such +as #A or #215-C s/^(#?)0+([1-9]\S+)/$1$2/; # correct numberin +g, like 0000048th -> 48th s/^(mc|o')(.+$)/\u\L$1\E\u\L$2/i; # correct casing o +f O'Brien, McDonald, etc. s/\.+$//g; # remove literal d +ots at the end of elements. $_ # required at end +of map when used like this } map { ucfirst lc $_ } # proper case word +s, simple split ' ', $prop->{'address'}; # split on spaces # Normalize street suffixes to standard postal abbreviations # This is easier than creating a temp array. $prop->{'address'} =~ s/ (\w+)$ / if (defined $street_suf_lkup{lc $1}) +{ $street_suf_lkup{lc $1} } else { $1 } /ex;

This has worked fine for a few years. However, the other day, I came across the address 123 Circle Way. My program obediently changed it to 123 Cir Way, which then choked up the works further down the line.

My question is whether I can tell if there are any more elements coming through the map pipe, as it were. In other words, if Circle were the last element of the split address, it should be abbreviated, but if it's any other element (as in Circle Way), it should be left alone.

I fully realize there are other ways to do this by rewriting the code, but I like using map this way.

Any thoughts on this?

Thanks. Marmot.


In reply to Counting elements being mapped via map{} by furry_marmot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.