That's a good suggestion, I'll experiment with the alternatives a bit and see if
I can come up with a good example including tests and links. I'll still have to flesh out the desired outcome. I assume that many refactorings could
be implemented by just knowing the AST annotated with source positions and I
would like to try to write something like that in Perl. Now I notice that using substring references actually wouldn't be a good idea,
because assigning a different-length string to one of the references would
invalidate the other references in that they wouldn't point to the token any
more. I'll see if I can find out how refactoring tools in major IDEs do this, I'm not sure if they keep code formatting intact or if they just pretty-print the new AST in a canonical way.
About your thoughts:
- While the example doesn't require recursive parsing, the Go grammar does
require some recursive parsing, for example a statement parser can invoke a
block parser, which then again invokes the statement parser. When I update the
post with a clear desired outcomes and tests, I will update the example to
better match the complexity of the task, while still keeping it small.
- Because of the recursive calls it's necessary to use (DEFINE) instead of
interpolating $vars
- The trade-offs I see is that with one big regex it's easier to see at a glance
how the grammar rules are connected, especially with syntax highlighting,
whereas with splitted regexes it's possible to set breakpoints in specific
grammar rules for debugging and it's not such an esoteric solution, so it's
better suited for writing widely readable code.
For reference, I will explain my understanding of the embedded code in the big regex for parsing JSON. I will describe the data structure returned by the embedded code as a tree to make the explanation clearer:
- After returning from each rule, the payload (result of the last embedded code expression) is set to the previous payload in the left subtree ($^R->[0]) and the rule's payload in the right subtree ($^R->1)
- In order to make this invariant hold:
- the payload for a rule with 0 or more matches of a subrule must be set at the rule's beginning for the 0-matches case
- when combining payloads from multiple subrule matches (from a Kleene star or from a rule having multiple subrules), the right branches of the tree must be combined in the right subtree and the left subtree is set to the only left branch in the tree