When I do stuff like this I like to regularize the data by stripping out punctuation that makes things more complicated. In most of the US it's not too hard to determine if something is a phone number-- it will generally have 7,10, or 11 numerical digits (except inside companies' private exchanges and a few small towns like Volcano Village, HI) and some form of separators that depend on where whoever wrote it is from and what mood they were in when they wrote it. I included a little twist for extensions, which are usually appended as x\d+, where there may or may not be a space before the x.

The example below will strip out the punctuation that's around the numbers then check the length of any runs. If it's in the 7 to 11 range I declare it to be a phone number and anything else is part of an address.

#!/usr/bin/perl use strict; use warnings; use v5.10; my @numbers=('(123)456-7890', "222.222.2222", "1-313-345-6798","23-35 +Baker St. Apt 6", "666 666 6666", "123-345.5678", "45 elm street", "1 +23-345.5678x999", "666 666 6666 x233"); foreach my $number (@numbers){ #strip phone number punctuation: my $address=$number; $number =~ s/\(?(\d+)[-(). ](\d|x\d)/$1$2/g; if ($number=~m/\d{7,11}/){ # you could regularize phone number formatting in here say $number." Phone number"; } else { say $address." Address"; # process the number as an address $address =~ m/(\d+)/; say "address number $1"; } }

with output

1234567890 Phone number 2222222222 Phone number 13133456798 Phone number 23-35 Baker St. Apt 6 Address address number 23 6666666666 Phone number 1233455678 Phone number 45 elm street Address address number 45 1233455678x999 Phone number 6666666666x233 Phone number

Note that I got lazy and didn't bother pulling out all the numbers within an address string, which I let be lengths other than just your 3 & 4 digit runs. I also miss on numbers like 1-(800)-222-2222, but that's just a little more regex tweaking. I don't strip commas, since I don't think I've ever seen commas used to punctuate a US phone number. They might also be your big flag for lists of apt numbers. If you're dealing with phone numbers in Europe you're probably doomed-- they seem to have random numbers of digits over a very large range.


In reply to Re: Pull 3-digit and 4-digit numbers from string by bitingduck
in thread Pull 3-digit and 4-digit numbers from string by htmanning

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.