pinnacle has asked for the wisdom of the Perl Monks concerning the following question:

Question: Change CA to California.

Please Assist!!

while(my $inp = <DATA>) { my ($name,$ph,$addr,$date,$num) = split(/:/,$inp); my ($st,$city,$st_zip) = split(/,/,$addr); my ($n,$st,$zip) = split(/ /,$st_zip); if($st =~ /CA,/){ my $conv = s/CA,/California/; print "$st\n"; }}
Tommy Savage:408–724–0140: 12 2 2 Oxbow Court, Sunnyvale, CA 94087: 5/ +19/66: 34200 Lesle Kerstin: 408–456–123 4: 4 Harvard Square, Boston, MA 02133: 4/22/62: 52600 JonDeLoach: 408–253–3 122: 12 3 Park St. , San Jose, CA 94086: 7/25/53 +: 85100 Ephram Hardy:293–259–5395: 2 3 5 Carlton Lane, Joliet, IL 73858: 8/12/20: 56700 etty Boop: 245–836–83 57: 63 5 Cutesy Lane, Hollywood, CA 91464: 6/23/23: 14500 Wilhelm Kopf:846–836–2837 : 693 7 Ware Road, Milton, PA 93756: 9/21/46: 43500 Norma Corder:397–857 –2735: 74 Pine Street, Dearborn, MI 23874: 3/28/45: 245700 James Ikeda: 834–938–8376: 2 3 445 Aster Ave. , Allentown, NJ 83745: 12/1/38: 45000 Lori Gortz: 327–832–5728: 3 465 Mirlo Street, Peabody, MA 34756: 10/2/65: 35200 Barbara Kerz:385–573 –8326: 83 2 Ponce Drive, Gary, IN 83756: 12/15/46: 268500

Replies are listed 'Best First'.
Re: regex CA to California
by choroba (Cardinal) on Nov 12, 2010 at 18:19 UTC
    There's no comma after "CA" in your data. Moreover, you are setting (not replacing, note missing ~) $conv which is never used again.
Re: regex CA to California
by ikegami (Patriarch) on Nov 12, 2010 at 18:55 UTC

    Aside from the aforementioned problem with the comma, you're substituting the contents of $_ (since you didn't say otherwise using =~). Fix:

    if ($st =~ /, CA\b/) { $st =~ s/, CA\b/, California/; print "$st\n"; }

    But why match twice? The following suffices.

    if ($st =~ s/, CA\b/, California/) { print "$st\n"; }
Re: regex CA to California
by eff_i_g (Curate) on Nov 12, 2010 at 18:24 UTC
Re: regex CA to California
by johngg (Canon) on Nov 12, 2010 at 23:34 UTC

    If you need to expand all of the state abbreviations then you could use a hash lookup and build a regular expression using the keys with alternation.

    use strict; use warnings; my %states = ( CA => q{California}, MA => q{Massachusetts}, IL => q{Illinois}, PA => q{Pennsylvania}, MI => q{Michigan}, NJ => q{New Jersey}, IN => q{Indiana}, ); my $rxStateAbbrev = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %states ] } ) \b }; }; while ( <DATA> ) { s{$rxStateAbbrev}{ $states{ $1 } }eg; print; } __END__ Tommy Savage:408-724-0140: 12 2 2 Oxbow Court, Sunnyvale, CA 94087: 5/ +19/66: 34200 Lesle Kerstin: 408-456-123 4: 4 Harvard Square, Boston, MA 02133: 4/22 +/62: 52600 JonDeLoach: 408-253-3 122: 12 3 Park St. , San Jose, CA 94086: 7/25/53 +: 85100 Ephram Hardy:293-259-5395: 2 3 5 Carlton Lane, Joliet, IL 73858: 8/12/ +20: 56700 etty Boop: 245-836-83 57: 63 5 Cutesy Lane, Hollywood, CA 91464: 6/23/ +23: 14500 Wilhelm Kopf:846-836-2837 : 693 7 Ware Road, Milton, PA 93756: 9/21/46 +: 43500 Norma Corder:397-857 -2735: 74 Pine Street, Dearborn, MI 23874: 3/28/4 +5: 245700 James Ikeda: 834-938-8376: 2 3 445 Aster Ave. , Allentown, NJ 83745: 1 +2/1/38: 45000 Lori Gortz: 327-832-5728: 3 465 Mirlo Street, Peabody, MA 34756: 10/2/ +65: 35200 Barbara Kerz:385-573 -8326: 83 2 Ponce Drive, Gary, IN 83756: 12/15/46 +: 268500

    The output.

    Tommy Savage:408-724-0140: 12 2 2 Oxbow Court, Sunnyvale, California 9 +4087: 5/19/66: 34200 Lesle Kerstin: 408-456-123 4: 4 Harvard Square, Boston, Massachusetts +02133: 4/22/62: 52600 JonDeLoach: 408-253-3 122: 12 3 Park St. , San Jose, California 94086: + 7/25/53: 85100 Ephram Hardy:293-259-5395: 2 3 5 Carlton Lane, Joliet, Illinois 73858: + 8/12/20: 56700 etty Boop: 245-836-83 57: 63 5 Cutesy Lane, Hollywood, California 9146 +4: 6/23/23: 14500 Wilhelm Kopf:846-836-2837 : 693 7 Ware Road, Milton, Pennsylvania 9375 +6: 9/21/46: 43500 Norma Corder:397-857 -2735: 74 Pine Street, Dearborn, Michigan 23874: +3/28/45: 245700 James Ikeda: 834-938-8376: 2 3 445 Aster Ave. , Allentown, New Jersey +83745: 12/1/38: 45000 Lori Gortz: 327-832-5728: 3 465 Mirlo Street, Peabody, Massachusetts 3 +4756: 10/2/65: 35200 Barbara Kerz:385-573 -8326: 83 2 Ponce Drive, Gary, Indiana 83756: 12/ +15/46: 268500

    I hope this is helpful.

    Cheers,

    JohnGG

      One problem you have with this approach is that you'll replace *any* occurrence of a state abbreviation with the full state name, without any regard of the structure of the address. For instance, 'MA Baker', who lives on "AZ Square, NYC", will not be pleased with the mangling of her address.

      In fact, the OP already has state abbreviation in a variable. Which doesn't contain anything else. Just a

      $st = $states{$st} || $st;
      will do.

      Oh, and why does your s/// have the /e modifier?

        Yes, good points which also occurred to me in the sleepless wee small hours. Just goes to show that you shouln't post when over-tired :-(

        Slightly more robust, given the assumtion that the state abbreviation is always followed by a 5-digit ZIP code.

        use strict; use warnings; my %states = ( CA => q{California}, MA => q{Massachusetts}, IL => q{Illinois}, PA => q{Pennsylvania}, MI => q{Michigan}, NJ => q{New Jersey}, IN => q{Indiana}, ); my $rxStateAbbrev = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %states ] } ) \b (?= \s+ \d{5} ) }; }; while ( <DATA> ) { s{ $rxStateAbbrev }{ $states{ $1 } }x; print; } __END__ Tommy Savage:408-724-0140: 12 2 2 Oxbow Court, Sunnyvale, CA 94087: 5/ +19/66: 34200 Lesle Kerstin: 408-456-123 4: 4 Harvard Square, Boston, MA 02133: 4/22 +/62: 52600 JonDeLoach: 408-253-3 122: 12 3 Park St. , San Jose, CA 94086: 7/25/53 +: 85100 Ephram Hardy:293-259-5395: 2 3 5 Carlton Lane, Joliet, IL 73858: 8/12/ +20: 56700 etty Boop: 245-836-83 57: 63 5 Cutesy Lane, Hollywood, CA 91464: 6/23/ +23: 14500 Wilhelm Kopf:846-836-2837 : 693 7 Ware Road, Milton, PA 93756: 9/21/46 +: 43500 Norma Corder:397-857 -2735: 74 Pine Street, Dearborn, MI 23874: 3/28/4 +5: 245700 James Ikeda: 834-938-8376: 2 3 445 Aster Ave. , Allentown, NJ 83745: 1 +2/1/38: 45000 Lori Gortz: 327-832-5728: 3 465 Mirlo Street, Peabody, MA 34756: 10/2/ +65: 35200 Barbara Kerz:385-573 -8326: 83 2 Ponce Drive, Gary, IN 83756: 12/15/46 +: 268500

        Cheers,

        JohnGG

Re: regex CA to California
by Anonymous Monk on Nov 12, 2010 at 18:24 UTC
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/\bCA\b/)->explain; __END__ The regular expression: (?-imsx:\bCA\b) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- CA 'CA' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    $ perl -pe " s/\bCA\b/California/ " Hi ca Ca CA dooodie Hi ca Ca California dooodie