A few random thoughts that I had some time ago about regex complexity. They are far from comprehensive, and not directly usable to measure complexity in some way (and also very personally biased), but I hope they provide food for thought.
Regexes are made of atoms (an atom is something like foobar or \d), groups (which can either capture or not), alternations and quantifiers.
Regexes are visually rather hard to parse if they have many groups, possibly nested.
For the mental complexity (ie trying to assess what a regex does) you have to note that
- Most atoms are very easy to understand, independently of whether they are meta-syntactic (like \d or anchors as ^) or literals (like foobar)</c>
- Grouping things doesn't make them harder to understand, if you do it with simple (...) or (?:...). The complexity of non-backtracking groups (?>...) is debatable, sometimes they make things much more intuitive to understand, sometimes they are counter-intuiive.
- The complexity of character class scales roughly linearly with the number of atoms, independently of possible negation
- Look-arounds are hard to get right, look-arounds that are within quantified groups are even harder.
- Code assertions... don't even think about them
- Back-references are hard, but not as hard as look-arounds.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.