Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This is a post I made the other day, just have another question. I tweaked it to disallow numbers less than 7 digits or more than 10 now. How can I make it so it can't match 1 aasdasd 2 adasddsa 3 adsadbbf 4 asdasd 5 6 7? It matches numbers that are separated by anything, I need to make sure they AREN'T. They can be separated with () or spaces or a ., but not letters.
my %seen; open(FILE, '<', 'file.txt') or die "Unable to open file.txt for readin +g, $!"; while (<FILE>) { chomp; tr/0-9//cd; if (length $_ >=7 && length $_ <= 10) { $_ = sprintf("%010s", $_); $_ =~ s/(\d{3})(\d{3})(\d{4})/$1-$2-$3/; $seen{$_}++; }

Replies are listed 'Best First'.
Re: phone number regex (new question)
by TomDLux (Vicar) on Mar 11, 2004 at 19:53 UTC

    Save a whole lot of effort by not deleting the non-numbers.

    Instead, specifiy the acceptable separator characters in the regex.

    $_ =~ s/\(?(\d{3})\)? # area code with optional parentheses [-.\s]? # possibly followed by separator (\d{3}) # first part of number [-.\s]? # optional separator (\d{4}) # last part of number /$1-$2-$3/x;

    You'd be better off using existing modules. If the closest modulkes are not quite good enough, why not modify the module, and possibly send your edits to the module maintainer.

    I'm allowing both period '.' or dash '-' or a space as the separator. The area code may or may not have parentheses.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: phone number regex (new question)
by Roy Johnson (Monsignor) on Mar 11, 2004 at 22:21 UTC
    die "or whatever" if /[^0-9()\- .]/;
    Update: Escaped the hyphen. Thanks, revdiablo.

    The PerlMonk tr/// Advocate

      I'd be careful with a character class like that. I haven't tested it, but the ')-' part might try to create a range starting at ')' and ending at ''. This is obviously not what was intended. Just to be safe, I'd be sure to put the '-' at the end of the character class.

Re: phone number regex (new question)
by pbeckingham (Parson) on Mar 12, 2004 at 05:06 UTC

    Be aware of the inherent assumptions here - most numbers listed/quoted these days are 10 digits, and in fact a good old 7-digit number is often inadequate, as it is becoming increasingly harder to guess the area code. I regularly encounter phone numbers with a half-dozen area codes, all of which are considered "local".

    In addition to this, the leading "1" is regularly but unnecessarily given.

    If we were to consider international numbers, then there really is no good way to write a regex to extract it.

Re: phone number regex (new question)
by wolfi (Scribe) on Mar 12, 2004 at 13:09 UTC

    food for thought from a newbie on your regex... this worked for a few of my tests, but i haven't fully experimented w/it.

    i also tried to condense this w/references, but the script would either fail (due to nested quantifiers) or i'd rec'v my 'sorry you goofed' statement... the few ref's i could use didn't save much space. So, if anyone has a better way to revise this - feel free.

    my $number =~ /^\s*1?(\.|\-)?\s*((\(|\[|\{)?\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*(\)|\]\})?)?\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*\d\s*(\.|\-)?\s*$/

    (i know, it's sooo ugly.)

    Basically breaks down like this:

    ^\s* #start/optional spaces 1?(\.|\-)?\s* #an optional 1- or 1. with optional spaces ( (\(|\[|\{)? #an optional bracket ( or { or [ \s*(\.|\-)?\s* #space or optional . or - \d\s*(\.|\-)?\s* #digit optional . or - and spaces \d\s*(\.|\-)?\s* # ditto \d\s*(\.|\-)?\s* # ditto (\)|\]\})? # choice of closing brackets )? # that section was optional \s*(\.|\-)?\s* #space or optional . or - \d\s*(\.|\-)?\s* #digit optional . or - and spaces \d\s*(\.|\-)?\s* # ditto \d\s*(\.|\-)?\s* # ditto \d\s*(\.|\-)?\s* # ditto \d\s*(\.|\-)?\s* # ditto \d\s*(\.|\-)?\s* # ditto \d\s*(\.|\-)?\s* # ditto $ # nothing more allowed
    it'll yield numbers w/ or w/out a leading "1" - w/ or w/out an area code (w/ or w/out parantheses) then seven digits - all of which can be separated by white-space, a dash, or a dot ->
    1.(234]-567.8.9.0.0 should be readable. You could then run it thru a loop to get it into a format you prefered, if ya wished (prob just removing everything 'cept digits would be easiest.) It's tough to account for every format a user would try, but this would do most.

    knock-wood