in reply to regex CA to California

If you need to expand all of the state abbreviations then you could use a hash lookup and build a regular expression using the keys with alternation.

use strict; use warnings; my %states = ( CA => q{California}, MA => q{Massachusetts}, IL => q{Illinois}, PA => q{Pennsylvania}, MI => q{Michigan}, NJ => q{New Jersey}, IN => q{Indiana}, ); my $rxStateAbbrev = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %states ] } ) \b }; }; while ( <DATA> ) { s{$rxStateAbbrev}{ $states{ $1 } }eg; print; } __END__ Tommy Savage:408-724-0140: 12 2 2 Oxbow Court, Sunnyvale, CA 94087: 5/ +19/66: 34200 Lesle Kerstin: 408-456-123 4: 4 Harvard Square, Boston, MA 02133: 4/22 +/62: 52600 JonDeLoach: 408-253-3 122: 12 3 Park St. , San Jose, CA 94086: 7/25/53 +: 85100 Ephram Hardy:293-259-5395: 2 3 5 Carlton Lane, Joliet, IL 73858: 8/12/ +20: 56700 etty Boop: 245-836-83 57: 63 5 Cutesy Lane, Hollywood, CA 91464: 6/23/ +23: 14500 Wilhelm Kopf:846-836-2837 : 693 7 Ware Road, Milton, PA 93756: 9/21/46 +: 43500 Norma Corder:397-857 -2735: 74 Pine Street, Dearborn, MI 23874: 3/28/4 +5: 245700 James Ikeda: 834-938-8376: 2 3 445 Aster Ave. , Allentown, NJ 83745: 1 +2/1/38: 45000 Lori Gortz: 327-832-5728: 3 465 Mirlo Street, Peabody, MA 34756: 10/2/ +65: 35200 Barbara Kerz:385-573 -8326: 83 2 Ponce Drive, Gary, IN 83756: 12/15/46 +: 268500

The output.

Tommy Savage:408-724-0140: 12 2 2 Oxbow Court, Sunnyvale, California 9 +4087: 5/19/66: 34200 Lesle Kerstin: 408-456-123 4: 4 Harvard Square, Boston, Massachusetts +02133: 4/22/62: 52600 JonDeLoach: 408-253-3 122: 12 3 Park St. , San Jose, California 94086: + 7/25/53: 85100 Ephram Hardy:293-259-5395: 2 3 5 Carlton Lane, Joliet, Illinois 73858: + 8/12/20: 56700 etty Boop: 245-836-83 57: 63 5 Cutesy Lane, Hollywood, California 9146 +4: 6/23/23: 14500 Wilhelm Kopf:846-836-2837 : 693 7 Ware Road, Milton, Pennsylvania 9375 +6: 9/21/46: 43500 Norma Corder:397-857 -2735: 74 Pine Street, Dearborn, Michigan 23874: +3/28/45: 245700 James Ikeda: 834-938-8376: 2 3 445 Aster Ave. , Allentown, New Jersey +83745: 12/1/38: 45000 Lori Gortz: 327-832-5728: 3 465 Mirlo Street, Peabody, Massachusetts 3 +4756: 10/2/65: 35200 Barbara Kerz:385-573 -8326: 83 2 Ponce Drive, Gary, Indiana 83756: 12/ +15/46: 268500

I hope this is helpful.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: regex CA to California
by JavaFan (Canon) on Nov 13, 2010 at 00:26 UTC
    One problem you have with this approach is that you'll replace *any* occurrence of a state abbreviation with the full state name, without any regard of the structure of the address. For instance, 'MA Baker', who lives on "AZ Square, NYC", will not be pleased with the mangling of her address.

    In fact, the OP already has state abbreviation in a variable. Which doesn't contain anything else. Just a

    $st = $states{$st} || $st;
    will do.

    Oh, and why does your s/// have the /e modifier?

      Yes, good points which also occurred to me in the sleepless wee small hours. Just goes to show that you shouln't post when over-tired :-(

      Slightly more robust, given the assumtion that the state abbreviation is always followed by a 5-digit ZIP code.

      use strict; use warnings; my %states = ( CA => q{California}, MA => q{Massachusetts}, IL => q{Illinois}, PA => q{Pennsylvania}, MI => q{Michigan}, NJ => q{New Jersey}, IN => q{Indiana}, ); my $rxStateAbbrev = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %states ] } ) \b (?= \s+ \d{5} ) }; }; while ( <DATA> ) { s{ $rxStateAbbrev }{ $states{ $1 } }x; print; } __END__ Tommy Savage:408-724-0140: 12 2 2 Oxbow Court, Sunnyvale, CA 94087: 5/ +19/66: 34200 Lesle Kerstin: 408-456-123 4: 4 Harvard Square, Boston, MA 02133: 4/22 +/62: 52600 JonDeLoach: 408-253-3 122: 12 3 Park St. , San Jose, CA 94086: 7/25/53 +: 85100 Ephram Hardy:293-259-5395: 2 3 5 Carlton Lane, Joliet, IL 73858: 8/12/ +20: 56700 etty Boop: 245-836-83 57: 63 5 Cutesy Lane, Hollywood, CA 91464: 6/23/ +23: 14500 Wilhelm Kopf:846-836-2837 : 693 7 Ware Road, Milton, PA 93756: 9/21/46 +: 43500 Norma Corder:397-857 -2735: 74 Pine Street, Dearborn, MI 23874: 3/28/4 +5: 245700 James Ikeda: 834-938-8376: 2 3 445 Aster Ave. , Allentown, NJ 83745: 1 +2/1/38: 45000 Lori Gortz: 327-832-5728: 3 465 Mirlo Street, Peabody, MA 34756: 10/2/ +65: 35200 Barbara Kerz:385-573 -8326: 83 2 Ponce Drive, Gary, IN 83756: 12/15/46 +: 268500

      Cheers,

      JohnGG