htmanning has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I am trying to pull 3 or 4 digit numbers from a database field, BUT not if it is a phone number.

We have apt numbers like 501, 1101, or 1201, etc. I need to recognize if it is an apartment number and create a link to an information page. If there is a phone number however, I need to ignore it.

I tried this, but there are several issues:

my ($unit4) = $text =~ /\b \d{4} \b/gx;
For one thing, sometimes $text has more than one number in it. It might be something like:
Sent notifications to 1402, 1304, 501
I need to grab all of those numbers and turn them into a link. I also need to ignore phone numbers like 222-2222 or 555-555-5555. Thanks,

Replies are listed 'Best First'.
Re: Pull 3-digit and 4-digit numbers from string
by aaron_baugher (Curate) on Apr 10, 2015 at 01:48 UTC

    This will depend a lot on how you define each format -- exactly what qualifies as an apartment number, and what qualifies as a phone number? Some apartment numbers have letters, for instance: Apt. #54a. But from your description, it sounds like you want to catch things like Apt. 302, but not 302-4321. In that case, you could grab all 3-4 digit strings that are not preceded or followed by a dash:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my $field = 'Apt.302. 123-4567. 4021'; # from database record $field = " $field "; # pad it while($field =~ /[^\d-](\d\d\d\d?)[^\d-]/g){ say "Apartment: $1"; }

    I'd guess for real-life data, you'll have to get fancier than that, but that may get you headed in the right direction.

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

Re: Pull 3-digit and 4-digit numbers from string
by Athanasius (Archbishop) on Apr 10, 2015 at 02:54 UTC

    Hello htmanning,

    If the phone numbers are punctuated with spaces rather than hyphens, you will need a different strategy. Here is one such approach (which builds on aaron_baugher’s solution):

    #! perl use strict; use warnings; use Data::Dump; while (<DATA>) { chomp; # Trim trailing newline my @apts = /[- \d]+/g; # Get all sequences of digits, spaces, & h +yphens s/^[ ]+ //x for @apts; # Trim initial spaces s/ [ ]+$//x for @apts; # Trim trailing spaces @apts = grep { /^\d{3,4}$/ } @apts; # Get 3- or 4-digit sequen +ces only dd \@apts; } __DATA__ John Smith lives in Apts. 123 & 456, home number: 555-6666-7777 Please phone Jane Doe on 111 2222 3333. She lives in apartment 789 in +the Main building. Please phone Janet Roe on 88899990123. She lives in apartment 100 in t +he Main building.

    Output:

    12:51 >perl 1211_SoPW.pl [123, 456] [789] [100] 12:51 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Pull 3-digit and 4-digit numbers from string
by AnomalousMonk (Archbishop) on Apr 10, 2015 at 04:25 UTC

    As others have posted, much depends on the exact definition of a "phone" or "apartment" number. The following works well, except see what happens with a "phone" number like "123.345-5678"; is such a character sequence possible? This uses the regex enhancements of Perl version 5.10.

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my $s = 'Sent to 1402, 222-2222, 1304, 555.555.5555 and 501, 666 666 6666, +123-345.5678'; ;; my $sep = qr{ [-. ] }xms; my $pn = qr{ \d{3} ($sep) (?: \d{3} \1)? \d{4} }xms; my $an = qr{ \d{3,4} }xms; ;; my @caps = $s =~ m{ (?| $pn (*SKIP)(*F) | (?<! \d) ($an) (?! \d)) }xmsg; ;; printf qq{'$_' } for @caps; " '1402' '1304' '501' '123'


    Give a man a fish:  <%-(-(-(-<

Re: Pull 3-digit and 4-digit numbers from string
by bitingduck (Deacon) on Apr 10, 2015 at 06:07 UTC

    When I do stuff like this I like to regularize the data by stripping out punctuation that makes things more complicated. In most of the US it's not too hard to determine if something is a phone number-- it will generally have 7,10, or 11 numerical digits (except inside companies' private exchanges and a few small towns like Volcano Village, HI) and some form of separators that depend on where whoever wrote it is from and what mood they were in when they wrote it. I included a little twist for extensions, which are usually appended as x\d+, where there may or may not be a space before the x.

    The example below will strip out the punctuation that's around the numbers then check the length of any runs. If it's in the 7 to 11 range I declare it to be a phone number and anything else is part of an address.

    #!/usr/bin/perl use strict; use warnings; use v5.10; my @numbers=('(123)456-7890', "222.222.2222", "1-313-345-6798","23-35 +Baker St. Apt 6", "666 666 6666", "123-345.5678", "45 elm street", "1 +23-345.5678x999", "666 666 6666 x233"); foreach my $number (@numbers){ #strip phone number punctuation: my $address=$number; $number =~ s/\(?(\d+)[-(). ](\d|x\d)/$1$2/g; if ($number=~m/\d{7,11}/){ # you could regularize phone number formatting in here say $number." Phone number"; } else { say $address." Address"; # process the number as an address $address =~ m/(\d+)/; say "address number $1"; } }

    with output

    1234567890 Phone number 2222222222 Phone number 13133456798 Phone number 23-35 Baker St. Apt 6 Address address number 23 6666666666 Phone number 1233455678 Phone number 45 elm street Address address number 45 1233455678x999 Phone number 6666666666x233 Phone number

    Note that I got lazy and didn't bother pulling out all the numbers within an address string, which I let be lengths other than just your 3 & 4 digit runs. I also miss on numbers like 1-(800)-222-2222, but that's just a little more regex tweaking. I don't strip commas, since I don't think I've ever seen commas used to punctuate a US phone number. They might also be your big flag for lists of apt numbers. If you're dealing with phone numbers in Europe you're probably doomed-- they seem to have random numbers of digits over a very large range.