Very nice reading, ++.
After Perl 5.10 release I started to write a regex that parses XML, and ran into similar problems.
What I learned during crafting that expression:
- nested regexes are complicated
- They are still buggy (I found two bugs in the re engine in two days)
- There's a reason we want Perl 6 regexes/rules
- (?{print "matched so far: $&\n"}) blocks really help while debugging
- Use atomic groups wherever possible. Not only for performance reasons, but also to avoid confusing, multiple execution of embedded closures