In that case I would be inclined to maintain a list of regexps - one for each allowable format - rather than (I predict) torturing one into handling successive new requirements until it finally dies in an agony of unmaintainability. I might even put them in a configuration file rather than code for easy update in production environments, load and chop them them into an array and then try them out successively on the data until a match is found or the possible formats exhausted.