Anchoring the ends of the match to white space fixes 2 and 6. Harder to fix 9 without knowing what else may be in the string and thus how much "buffer" is needed. However 9 could be fixed in a second test for overall length of a normalized string. Indeed, the whole matching process becomes easier by using a first pass to extract candidate numbers then a second pass to reject numbers that aren't the right length.

use strict; use warnings; use 5.010; my $string = <<STR; Example 1: 413-577-1234 Example 2: 981-413-777-8888 Example 3: 413.233.2343 Example 4: 562-3113 Example 5: 401 311 7898 Example 6: 2342343-23878-878-2343 Example 7: 1 (413) 555-2378 Example 8: 4135552378 Example 9: 413 789 8798 343 9878 STR extract($string); sub extract { my %results = (); my $string = shift; # pad string with spaces to make it easier to find phone numbers a +t beginning and end of strings $string = ' ' . $string . ' '; # get rid of consecutive whitespace characters to make regex easie +r and faster $string =~ s/(\s){2,}/$1/g; # find patterns in the string that look like phone numbers my @matches = $string =~ / (Example\s\d+:\s) # Anchor the left end of the phone number (?<=\s) # Look for ten digit North American numbers ( (?: 1(?:\.|\s|-))* # optional 1 followed by period OR + whitespace or dash \(? # optional opening paren \d{3} # three consecutive digits (?:\.|\)|-|\s|\)\s)? # optional punctuation (period, cl +ose paren, dash, whitespace) \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts | # Look for seven digit North American numbers \d{3} # three consecutive digits (?:\.|-|\s)? # optional punctuation \d{4} # 4 consecutive digts ) # Anchor the right end ofthe phone number (?=\s) /gx; say shift @matches, "match: '", shift @matches, "'" while @matches +; }

Prints:

Example 1: match: '413-577-1234' Example 3: match: '413.233.2343' Example 4: match: '562-3113' Example 5: match: '401 311 7898' Example 7: match: '1 (413) 555-2378' Example 8: match: '4135552378' Example 9: match: '413 789 8798'
Premature optimization is the root of all job security

In reply to Re^3: Regex for extracting phone numbers from string by GrandFather
in thread Regex for extracting phone numbers from string by nysus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.