Re (tilly) 2: Regex runs for too long

You might be able to get some useful ideas by looking at how RE (tilly) 4: SAS log scanner works and applying that to your matching problem.

The basic reference for understanding performance issues with REs is Mastering Regular Expressions by Friedl. The key is that the way that Perl's RE engine works is that it does a recursive search through ways of partially matching your pattern to the string, every time it fails backing up to the last choice it had and trying again. The problem is that the tree that it has to search can easily be exponential in the size of the string you are trying to match. The pattern /(foo)(.*)*\1\2/ on the string "foo and lots of garbage here..." is a canonical example. Of course Perl works hard to recognize the simple examples of this issue and special cases them so that won't be slow.

The general answer is therefore to write your RE in such a way that the RE makes as few choices as possible, and so that where it does make choices, it makes them as late as possible so that you spend less time backtracking then redoing the same reasoning as possible. In this specific case the best you can do is watch the RE match, and as it goes, trace on pen and paper exactly what choices it is making where. You already know that the RE engine is backtracking, but trying to reason along with it should show you why it is backtracking. (The hard part of the exercise is that you will tend to leap to general conclusions that you know are right, and miss that the computer is going to grind away missing the obvious at great length...)

Comment on Re (tilly) 2: Regex runs for too long