in reply to Regex for extracting phone numbers from string
OK, after sleeping on it I decided to experiment more with positive look-ahead and look-behind assertions. They seem to have done the trick. Here is the new code with new examples:
my $string = "Example 1: 413-577-1234 Example 2: 981-413-777-8888 Example 3: 413.233.2343 Example 4: 562-3113 Example 5: 401 311 7898 Example 6: 55555-55555-555-5555 Example 7: 1 (413) 555-2378 Example 7: 1(413)666-2378 Example 8: 4135552378 Example 9: 413 789 8798 343 9878 Example 10: 22222222222222222222"; extract($string); sub extract { my $string = shift; # remove extraneous whitespace to simplify regex $string =~ s/(\s){2,}/$1/g; # add double spaces to both ends of string to make it easier to find + phone numbers at beginning and end of strings $string = ' ' . $string . ' '; # find patterns in the string that look like phone numbers my @matches = $string =~ / # Look for ten digit North American numbers (?<=\D\s) # Positive look-behind assertion to avo +id false positives (?:1(?:\.|\s|-)*)* # optional 1 followed by period OR opti +onal whitespace or dash (?<=\s) # Positive look-behind assertion to avo +id false positives (?:\(?\d{3}\) | # optional three digits surrounded by p +arens (?<=\s) # Positive look-behind assertion to avo +id false positives \d{3}) # three consecutive digits (?:\.|-|\s|\)\s)? # optional punctuation (period, dash, w +hitespace) \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts (?=\s\D) # Positive look-ahead assertion to av +oid false positives | # Look for seven digit North American numbers (?<=\D\s) # Positive look behind assertion to avoid fa +lse positives \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts (?=\s\D) # Positive look-ahead assertion to avoid f +alse positives /gx; # get rid of matches with new lines @matches = grep { index($_, "\n") == -1 } @matches; say "Match: '$_'" for @matches; }
This outputs six phone numbers which I think any human would agree are in the original input:
Match: '413-577-1234' Match: '413.233.2343' Match: '562-3113' Match: '401 311 7898' Match: '1 (413) 555-2378' Match: '4135552378'
$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks
|
|---|