lev36 has asked for the wisdom of the Perl Monks concerning the following question:

Oh most wise Perl monks, I have a regular expression question.

Simply put, I want to match a phone number, IF it has a certain area code (800) AND a certain prefix (555), but NOT if it has a certain suffix (9999).

I.e., 800-555-nnnn should match unless 'nnnn' equals '9999'. Is there a way to do that in a single expression?

Many thanks,
Lev

2006-03-16 Retitled by planetscape, as per Monastery guidelines
Original title: 'Regular expression question'

Replies are listed 'Best First'.
Re: Phone Number Regular Expression
by Roy Johnson (Monsignor) on Mar 16, 2006 at 16:55 UTC

      Thanks, Roy, that's most helpful!

      Of course, it's not quite as simple as all that - I want to substitute a different area code, and make sure to capture things that are formatted slightly differently. So I tested out this substitution, and I'm a little trouble with the scalars:

      $_ = "800 555-9998"; s/800( |-|.)555-(?!9999)([0-9]{4})/888$1555-$2/; print $_;

      This doesn't seem to work, when I try to include $1 (the separator), though the second scalar comes through fine. I'm guessing I need something to set off $1 from '555'?

      Apologies if I'm missing some basic syntax rules; I'm fairly new to perl's regexp lingo.

        s/.../888${1}555-$2/ will do it for you. The braces around the variable name, specifically, are what you're looking for.

        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

        Unrelated to the immediate problem, but you probably want ( |-|\.) instead of ( |-|.) near the start of your match, since the dot has special meaning within a regex and will match any character (almost. see perlre).

        This means that currently that part of your regex is equivalent to just (.), and will happily match something like 8003555-1234

        You can set off variables from their surrounding text by enclosing the variable name in braces:
        s/800( |-|.)555-(?!9999)([0-9]{4})/888${1}555-$2/;

        Caution: Contents may have been coded under pressure.
      One more question: what if the phone number is split over two lines? Is there any way to easily deal with that?
        Sure. If you are saying that $_ has newlines in it, you can just remove them with tr/\n//d. Then process as before.

        It may be slightly trickier if you're saying the number is split across lines in an input file that you're reading one line at a time. Basically, you'll want to remember some of the previous line while you read the next line, stick them together, then look for the phone number. Exactly how you work that out depends on what you know about your input.


        Caution: Contents may have been coded under pressure.
Re: Phone Number Regular Expression
by ikegami (Patriarch) on Mar 16, 2006 at 18:27 UTC
    There's also
    /800-555-(?!9999)/
    and
    /(\d+)-(\d+)-(\d+)/ && $1 == 800 && $2 == 555 && $3 != 9999
Re: Phone Number Regular Expression
by injunjoel (Priest) on Mar 16, 2006 at 17:02 UTC
    My 2 cents.
    /800\-555\-^9{4}/
    Update
    The above didn't fully meet the requirements thanks zer for pointing that out.
    I would suggest using Roy_Johnson's approach as others have suggested.

    -InjunJoel

    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
      oh you got me thanks

      Roy Johnson's example works how it is supposed to.

      injunjoel 800-555-9888 doesnt work properly on your example

Re: Phone Number Regular Expression
by TedPride (Priest) on Mar 16, 2006 at 20:39 UTC
    use strict; use warnings; my $area = 800; my $pre = 555; my $not = 9999; $_ = join '', <DATA>; print "$area-$pre-$1\n" while m/$area(?:\)?\s|\.|-)$pre[\s\.-]((?!9999 +)\d{4})/g; __DATA__ 1-800-555-9999 1-800-555-3456 (800) 555-3456 800 555 3456 800 555-3456 800.555.3456 800 555.5555
    I think this covers all eventualities.

      Elegant, thanks!

      What I'm doing is editing a bunch of files on a web server to reflect the new area code - so I had planned to read each line of the input file, make the replacement, and then save the new version (and a backup of the old version). Then I thought about the line-break issue.

      So I think Roy's solution of grabbing a couple lines together, checking them, and them splitting them again before writing to the new file is what I need to do, in order to catch numbers that split over two lines. I see some stuff in your example that can help me with that.

A reply falls below the community's threshold of quality. You may see it by logging in.