OK, after sleeping on it I decided to experiment more with positive look-ahead and look-behind assertions. They seem to have done the trick. Here is the new code with new examples:

my $string = "Example 1: 413-577-1234 Example 2: 981-413-777-8888 Example 3: 413.233.2343 Example 4: 562-3113 Example 5: 401 311 7898 Example 6: 55555-55555-555-5555 Example 7: 1 (413) 555-2378 Example 7: 1(413)666-2378 Example 8: 4135552378 Example 9: 413 789 8798 343 9878 Example 10: 22222222222222222222"; extract($string); sub extract { my $string = shift; # remove extraneous whitespace to simplify regex $string =~ s/(\s){2,}/$1/g; # add double spaces to both ends of string to make it easier to find + phone numbers at beginning and end of strings $string = ' ' . $string . ' '; # find patterns in the string that look like phone numbers my @matches = $string =~ / # Look for ten digit North American numbers (?<=\D\s) # Positive look-behind assertion to avo +id false positives (?:1(?:\.|\s|-)*)* # optional 1 followed by period OR opti +onal whitespace or dash (?<=\s) # Positive look-behind assertion to avo +id false positives (?:\(?\d{3}\) | # optional three digits surrounded by p +arens (?<=\s) # Positive look-behind assertion to avo +id false positives \d{3}) # three consecutive digits (?:\.|-|\s|\)\s)? # optional punctuation (period, dash, w +hitespace) \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts (?=\s\D) # Positive look-ahead assertion to av +oid false positives | # Look for seven digit North American numbers (?<=\D\s) # Positive look behind assertion to avoid fa +lse positives \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts (?=\s\D) # Positive look-ahead assertion to avoid f +alse positives /gx; # get rid of matches with new lines @matches = grep { index($_, "\n") == -1 } @matches; say "Match: '$_'" for @matches; }

This outputs six phone numbers which I think any human would agree are in the original input:

Match: '413-577-1234' Match: '413.233.2343' Match: '562-3113' Match: '401 311 7898' Match: '1 (413) 555-2378' Match: '4135552378'

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks


In reply to Re: Regex for extracting phone numbers from string by nysus
in thread Regex for extracting phone numbers from string by nysus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.