in reply to Assessing the complexity of regular expressions

A few random thoughts that I had some time ago about regex complexity. They are far from comprehensive, and not directly usable to measure complexity in some way (and also very personally biased), but I hope they provide food for thought.

Regexes are made of atoms (an atom is something like foobar or \d), groups (which can either capture or not), alternations and quantifiers.

Regexes are visually rather hard to parse if they have many groups, possibly nested.

For the mental complexity (ie trying to assess what a regex does) you have to note that

Replies are listed 'Best First'.
Re^2: Assessing the complexity of regular expressions
by kyle (Abbot) on Jan 28, 2009 at 17:13 UTC

    Thanks for your thoughts! Your biases are what I was looking for. I was hoping that with enough input there would be a consensus (e.g., "look around assertions are confusing to everyone"). Perhaps I should have asked what people have the most trouble getting right, what they find themselves fixing most often, or just what they use the most.

    This all seems to favor a "score the tree" approach. As I think about that, I wonder if it makes sense to give the user some input into the scoring. That is, someone could say, "I can't ever understand code assertions, so they're of the highest complexity, but I write look-arounds in my sleep, so they're low complexity". On the other hand, it's supposed to be a tool of maintainability so it makes more sense for things to be scored as the typical programmer sees them.

    Anyway, I appreciate you sharing your thinking on this.