comment on

Anchoring the ends of the match to white space fixes 2 and 6. Harder to fix 9 without knowing what else may be in the string and thus how much "buffer" is needed. However 9 could be fixed in a second test for overall length of a normalized string. Indeed, the whole matching process becomes easier by using a first pass to extract candidate numbers then a second pass to reject numbers that aren't the right length.

use strict;
use warnings;
use 5.010;

my $string = <<STR;
Example 1: 413-577-1234
Example 2: 981-413-777-8888 
Example 3: 413.233.2343 
Example 4: 562-3113 
Example 5: 401 311 7898
Example 6: 2342343-23878-878-2343
Example 7: 1 (413) 555-2378
Example 8: 4135552378
Example 9: 413 789 8798 343 9878
STR
extract($string);


sub extract {
    my %results = ();
    my $string  = shift;

    # pad string with spaces to make it easier to find phone numbers a
+t beginning and end of strings
    $string = ' ' . $string . ' ';

    # get rid of consecutive whitespace characters to make regex easie
+r and faster
    $string =~ s/(\s){2,}/$1/g;

    # find patterns in the string that look like phone numbers
    my @matches = $string =~ /
        (Example\s\d+:\s)
        
        # Anchor the left end of the phone number
        (?<=\s)
        
        # Look for ten digit North American numbers
        (
            (?: 1(?:\.|\s|-))*      # optional 1 followed by period OR
+ whitespace or dash 
            \(?                     # optional opening paren
            \d{3}                   # three consecutive digits
            (?:\.|\)|-|\s|\)\s)?    # optional punctuation (period, cl
+ose paren, dash, whitespace)
            \d{3}                   # three consecutive digits
            (?:\.|-|\s)?            # optional punctuation
            \d{4}                   # 4 consecutive digts
            
            | # Look for seven digit North American numbers
            
            \d{3}              # three consecutive digits
            (?:\.|-|\s)?       # optional punctuation
            \d{4}              # 4 consecutive digts
        )
        
        # Anchor the right end ofthe phone number
        (?=\s)
        
        /gx;

    say shift @matches, "match: '", shift @matches, "'" while @matches
+;

}
[download]

Prints:

Example 1: match: '413-577-1234'
Example 3: match: '413.233.2343'
Example 4: match: '562-3113'
Example 5: match: '401 311 7898'
Example 7: match: '1 (413) 555-2378'
Example 8: match: '4135552378'
Example 9: match: '413 789 8798'
[download]

Premature optimization is the root of all job security

In reply to Re^3: Regex for extracting phone numbers from string by GrandFather
in thread Regex for extracting phone numbers from string by nysus

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.