Not knowing what the input file really looks like (but hearing from the OP that it contains "lots of other junk", like addresses, email, etc), I would tend not to trust this sort of approach. What if some lines have multiple numeric fields, one of which is a phone number? What about a line like "1340 S. 123rd St Apt. 310"? (After deleting all the non-digits, you get something that looks like a phone number.) And so on.