There are certainly more clever ways to do it, but for small data sets (as in takes less than a few minutes to process) I'd just do multiple passes, where on the first one I'd find all the phone numbers and wrap them in some kind of tag (typically an xml-ish tag, but really anything you won't see in the regular data set, like a double underscore) so that I know to ignore them on the next pass. Then on the next pass I'd make the regex and/or logic such that it ignores things tagged as phone numbers. If I get motivated later this evening I'll post some code.