in reply to Re: Regex for extracting phone numbers from string
in thread Regex for extracting phone numbers from string

#2 is not rejected. It shows as a match: 1-413-777-8888

#5 was a typo. I meant #6

#9 results in two different matches.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

  • Comment on Re^2: Regex for extracting phone numbers from string

Replies are listed 'Best First'.
Re^3: Regex for extracting phone numbers from string
by GrandFather (Saint) on Apr 11, 2016 at 21:05 UTC

    Anchoring the ends of the match to white space fixes 2 and 6. Harder to fix 9 without knowing what else may be in the string and thus how much "buffer" is needed. However 9 could be fixed in a second test for overall length of a normalized string. Indeed, the whole matching process becomes easier by using a first pass to extract candidate numbers then a second pass to reject numbers that aren't the right length.

    use strict; use warnings; use 5.010; my $string = <<STR; Example 1: 413-577-1234 Example 2: 981-413-777-8888 Example 3: 413.233.2343 Example 4: 562-3113 Example 5: 401 311 7898 Example 6: 2342343-23878-878-2343 Example 7: 1 (413) 555-2378 Example 8: 4135552378 Example 9: 413 789 8798 343 9878 STR extract($string); sub extract { my %results = (); my $string = shift; # pad string with spaces to make it easier to find phone numbers a +t beginning and end of strings $string = ' ' . $string . ' '; # get rid of consecutive whitespace characters to make regex easie +r and faster $string =~ s/(\s){2,}/$1/g; # find patterns in the string that look like phone numbers my @matches = $string =~ / (Example\s\d+:\s) # Anchor the left end of the phone number (?<=\s) # Look for ten digit North American numbers ( (?: 1(?:\.|\s|-))* # optional 1 followed by period OR + whitespace or dash \(? # optional opening paren \d{3} # three consecutive digits (?:\.|\)|-|\s|\)\s)? # optional punctuation (period, cl +ose paren, dash, whitespace) \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts | # Look for seven digit North American numbers \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts ) # Anchor the right end ofthe phone number (?=\s) /gx; say shift @matches, "match: '", shift @matches, "'" while @matches +; }

    Prints:

    Example 1: match: '413-577-1234' Example 3: match: '413.233.2343' Example 4: match: '562-3113' Example 5: match: '401 311 7898' Example 7: match: '1 (413) 555-2378' Example 8: match: '4135552378' Example 9: match: '413 789 8798'
    Premature optimization is the root of all job security