in reply to phone number regex

It's not entirely a regex solution but perhaps something like this could be used?

my %seen; open(FILE, '<', 'file.txt') or die "Unable to open file.txt for readin +g, $!"; while (<FILE>) { chomp; tr/0-9//cd; $_ = sprintf("%010s", $_); $_ =~ s/(\d{3})(\d{3})(\d{4})/$1-$2-$3/; $seen{$_}++; } close(FILE); open(SAVED, '>', 'saved.txt') or die "Unable to open saved.txt for wri +ting, $!"; print SAVED "$_\n" for (sort keys %seen); close(SAVED);

This will strip all non-digits and pad the left side of the number with zeros if it's less than 10 digits long.
Updated to add the dashes (how'd I miss that?)

Replies are listed 'Best First'.
Re: Re: phone number regex
by graff (Chancellor) on Mar 10, 2004 at 05:23 UTC
    Not knowing what the input file really looks like (but hearing from the OP that it contains "lots of other junk", like addresses, email, etc), I would tend not to trust this sort of approach. What if some lines have multiple numeric fields, one of which is a phone number? What about a line like "1340 S. 123rd St Apt. 310"? (After deleting all the non-digits, you get something that looks like a phone number.) And so on.

      Details!

      Okay, you caught me napping at the keyboard. I didn't filter the input.

Re: Re: phone number regex
by Anonymous Monk on Mar 10, 2004 at 15:30 UTC
    That works great!! A few questions on this. Every time I run this, I get the number 000-000-0000 on the top of my list even though it's not in my file. Any idea why?

    Also, can you explain what the sprtinf line is doing? I know sprintf does something with numbers but I don't see what this is actually doing. This matches MORE than 10 characters if you find numbers like 123.231.343.343 it would match the ending 343 as well, so I it's not telling how many numbers to match.

    Thanks.

      It's finding a line without any numbers in it. Like I said in my reply to graff, I didn't check the input to see if it matched a phone number.

      The sprintf is creating a string that is a minimum of ten digits long and that is zero padded if the number is less than ten digits. "123.231.343.343" would become "123-231-343343" in the end.

Re: Re: phone number regex
by Anonymous Monk on Mar 13, 2004 at 01:19 UTC
    I tried your code but for some reason it doesn't always work. Here is a snippet of my junk code I need to parse.
    ---------------------------------------------------------------------- +---------- Residential MLS #: 2122894 Status: Active-NORMLS LP: $148,500 SP: $ 3850 Silsby Rd* University Heights* OH* 44118*-3102* Unit/Lot #: * + Area: 1303 Unit Floor #: Map Coordinate: C9D3 Subdivision/Complex: * Photos: Media: 5 Acres: 0.13 1/2 Yr. Tax : 1259 County: Cuyahoga* Owner/Agent: No Parcel ID# (PIN): 722-15-088* Year Built: 1940* Lot Dimensions: 40x14 +0 School District: 1810/Cleveland Hts-Univ Hts City List Type: ERS Irr +egular: N High School: MLS Cross Ref #: Sub Property Type: One Family List Date: 1/4/2004 MT: 55 Directions: Between S.Taylor & Warrensville Center Rds.,south of Cedar + Rd. # Rooms: 6 # Bedrooms: 3* Total Baths: 1.1 Finished SqFt: 1209* LO #/Name: 2710 / Realty One (440) 526-2900 Office Web Site: www.rea +ltyone.com LA #/Name: 450365 / Cindy Czepczynski (440) 582-7119 LA Email: c.cze +pczynski@realtyone.com LA 2 #/Name: / LA 2 Email: SAC: 0 BAC: 3 OAC: *Graduated LockBox Desc: Combination Compensation Explain: 3% on %100,000 plus 2% on remainder. Fixer Upper +: N Remarks: FRESH,CLEAN,UPDATED FROM TOP TO BOTTOM! ROOF'02,FURNACE'03,WI +NDOWS'01,BSMT WATERPROOFED'02,REMODELED KITCHEN'01,REMODELED FBA'03,REMODELED HBA'02,MANY MORE UPDATES! LOCATE +D IN DESIRABLE AREA. COZY WBFP IN FAM RM,WALK-UP ATTIC FOR STORAGE,REC/PLAY AREA IN BSMT,FRESHLY PAINTED THROUGHOUT. MOVE RIG +HT IN!!! Broker Remarks: SUBJECT TO SELLERS FINDING HOME OF CHOICE. ---------------------------------------------------------------------- +---------- Residential MLS #: 2121062 Status: Active-NORMLS LP: $148,500 SP: $ 10761 MEADOWBROOK PARMA HEIGHTS OH 44130- Unit/Lot #: Area: 402 + Unit Floor #: Map Coordinate: C22B3 Subdivision/Complex: Photos: Media: 6 Acres: 1/2 Yr. Tax : 1271 County: Cuyahoga Owner/Agent: Parcel ID# (PIN): 47412006 Year Built: 1965 Lot Dimensions: 50x331 School District: 1824/Parma City List Type: ERS Irregular: N High School: MLS Cross Ref #: Sub Property Type: One Family List Date: 12/13/2003 MT: 77 Directions: OFF YORK ROAD # Rooms: # Bedrooms: 3 Total Baths: 2 Finished SqFt: 1192 LO #/Name: 2269 / Prudential Farina 1st American (440) 888-2300 Offi +ce Web Site: LA #/Name: 332633 / Janice Burton (440) 886-5941 LA Email: jburtonc2 +1@aol.com LA 2 #/Name: / LA 2 Email: SAC: 3 BAC: 3 OAC: *Graduated, Dual LockBox Desc: Compensation Explain: 3% OF $100,000 AND 2 1/2% REMAINDER Fixer Upper: + N Remarks: ALL BRICK, MOVE-IN CONDITION! EAT-IN KITCHEN WITH CERAMIC TIL +E INCLUDES ALL APPLIANCES!FORMAL DINING ROOM! NEWER WINDOWS! SHARP FINISHED BASEMENT W/DECORATIVE FIREPLACE AND BAR! EXERC +ISE ROOM AND LAUNDRY ROOM!VERY PRIVATE, LARGE LOT BACKS TO TRI-C! HOME WARRANTY! BSMT WATERPROOFED W/WARRANTY! WON'T LAST! Broker Remarks: ---------------------------------------------------------------------- +---------- Residential MLS #: 2114484 Status: Active-NORMLS LP: $148,500 SP: $ 33964 Morning Glory Ln North Ridgeville Oh 44039- Unit/Lot #: Are +a: 505 Unit Floor #: 33964 Map Coordinate: L07D2 Subdivision/Complex: Wildflower Photos: Media: 6 Acres: 1/2 Yr. Tax : 1114 County: Lorain Owner/Agent: Parcel ID# (PIN): 07 00 008 704 008 Year Built: 1997 Lot Dimensions: + School District: 4711/North Ridgeville City List Type: ERS Irregular +: N High School: N RIDGEVILLE MLS Cross Ref #: Sub Property Type: Condominium List Date: 10/21/2003 MT: 130 Directions: CENTER RIDGE NORTH ON WILDFLOWER LEFT ON MORNING GLORY # Rooms: 5 # Bedrooms: 2 Total Baths: 2 Finished SqFt: 1372 LO #/Name: 2802 / Smythe, Cramer Co. (440) 888-5353 Office Web Site: + www.smythecramer.com LA #/Name: 438395 / Robert Miller (440) 979-5783 LA Email: rmiller@s +mythecramer.com LA 2 #/Name: / LA 2 Email: SAC: 0 BAC: 3 OAC: *Graduated LockBox Desc: Compensation Explain: 3%/1ST $100k,2% REMAINDER Fixer Upper: N Remarks: ELEGANT ATTACHED RANCH CONDO! CORIAN COUNTERS CERAM TILE & NE +WER CARPETING, VAULTED & 9`CEILINGS & A FRESH, NEUTRAL DECOR. OWNERS SUITE W/FBA & WALK-IN. THE CONVENIENCE OF A COVERED ENTR +Y, 2CAR ATT & LAUNDRY ROOM. THE EXTERIOR HAS BEEN ENHANCED BY A RAISED & FENCED DECK & GARDENS. YOU`LL LOVE IT! Broker Remarks: NO SHOWINGS BEFORE NOON ---------------------------------------------------------------------- +---------- Prepared by: Mary Ann Zahand / (440) 878-6296 Information is Believed + To Be Accurate But Not Guaranteed Date Printed: Fri, Feb 27, 2004 ---------------------------------------------------------------------- +---------- Residential MLS #: 2113958 Status: Active-NORMLS LP: $148,500 SP: $ 1490 BENNETT RD MADISON OH 44057- Unit/Lot #: Area: 1122 Unit Floor #: Map Coordinate: L2A2 Subdivision/Complex: Photos: Media: 7 Acres: 0.58 1/2 Yr. Tax : 1130 County: Lake Owner/Agent: Parcel ID# (PIN): 01B102000034 Year Built: 1989 Lot Dimensions: 90X28 +0 School District: 4303/Madison Local List Type: ERS Irregular: N High School: MADISON MLS Cross Ref #: Sub Property Type: One Family List Date: 10/17/2003 MT: 134 Directions: RT 20-N ON BENNETT-N OF MADISON AVE W/S # Rooms: 7 # Bedrooms: 3 Total Baths: 2 Finished SqFt: 1600 LO #/Name: 2832 / Smythe, Cramer Co. (440) 428-1818 Office Web Site: + www.smythecramer.com LA #/Name: 272843 / Mary Ann Hubbard (440) 223-7653 LA Email: maryan +nhubbard@alltel.net LA 2 #/Name: / LA 2 Email: SAC: 0 BAC: 3.0 OAC: *Graduated LockBox Desc: Combination Compensation Explain: 3.0% ON 1ST $100,000 & 2& AFTER Fixer Upper: N Remarks: YOU'LL BE PLEASANTLY SURPIRSIED!/FRESHLY PAINTED, SOFTLY DECO +RATED, OPEN & AIRY/DRAMATIC LR W/VAULTED CEILING, FP & INDIRECT LIGHTING/MSTR SUITE W/PRIV. BATH & WALK IN CLOSET/SPARKLING W +HITE KIT. W/TONS OF CABINETS, ALL APP'L, BREAKFAST AREA/COZY DEN/BEAUTIFUL LANDSCAPING, TREED LOT Broker Remarks: ---------------------------------------------------------------------- +----------
    It skips nearly all phone numbers and picks up nearly everything that's NOT a phone number using your example. Any suggestions?

      I don't generally give fully functional programs out at the drop of a hat. The snippet that I did provide lacks a good input filter, which was pointed out in the replies to it.