Anchoring the ends of the match to white space fixes 2 and 6. Harder to fix 9 without knowing what else may be in the string and thus how much "buffer" is needed. However 9 could be fixed in a second test for overall length of a normalized string. Indeed, the whole matching process becomes easier by using a first pass to extract candidate numbers then a second pass to reject numbers that aren't the right length.
use strict;
use warnings;
use 5.010;
my $string = <<STR;
Example 1: 413-577-1234
Example 2: 981-413-777-8888
Example 3: 413.233.2343
Example 4: 562-3113
Example 5: 401 311 7898
Example 6: 2342343-23878-878-2343
Example 7: 1 (413) 555-2378
Example 8: 4135552378
Example 9: 413 789 8798 343 9878
STR
extract($string);
sub extract {
my %results = ();
my $string = shift;
# pad string with spaces to make it easier to find phone numbers a
+t beginning and end of strings
$string = ' ' . $string . ' ';
# get rid of consecutive whitespace characters to make regex easie
+r and faster
$string =~ s/(\s){2,}/$1/g;
# find patterns in the string that look like phone numbers
my @matches = $string =~ /
(Example\s\d+:\s)
# Anchor the left end of the phone number
(?<=\s)
# Look for ten digit North American numbers
(
(?: 1(?:\.|\s|-))* # optional 1 followed by period OR
+ whitespace or dash
\(? # optional opening paren
\d{3} # three consecutive digits
(?:\.|\)|-|\s|\)\s)? # optional punctuation (period, cl
+ose paren, dash, whitespace)
\d{3} # three consecutive digits
(?:\.|-|\s)? # optional punctuation
\d{4} # 4 consecutive digts
| # Look for seven digit North American numbers
\d{3} # three consecutive digits
(?:\.|-|\s)? # optional punctuation
\d{4} # 4 consecutive digts
)
# Anchor the right end ofthe phone number
(?=\s)
/gx;
say shift @matches, "match: '", shift @matches, "'" while @matches
+;
}
Prints:
Example 1: match: '413-577-1234'
Example 3: match: '413.233.2343'
Example 4: match: '562-3113'
Example 5: match: '401 311 7898'
Example 7: match: '1 (413) 555-2378'
Example 8: match: '4135552378'
Example 9: match: '413 789 8798'
Premature optimization is the root of all job security
|