Re^2: Positions of certain tokens in syntax tree (updated)

That's a good suggestion, I'll experiment with the alternatives a bit and see if I can come up with a good example including tests and links. I'll still have to flesh out the desired outcome. I assume that many refactorings could be implemented by just knowing the AST annotated with source positions and I would like to try to write something like that in Perl. Now I notice that using substring references actually wouldn't be a good idea, because assigning a different-length string to one of the references would invalidate the other references in that they wouldn't point to the token any more. I'll see if I can find out how refactoring tools in major IDEs do this, I'm not sure if they keep code formatting intact or if they just pretty-print the new AST in a canonical way.

About your thoughts:

While the example doesn't require recursive parsing, the Go grammar does require some recursive parsing, for example a statement parser can invoke a block parser, which then again invokes the statement parser. When I update the post with a clear desired outcomes and tests, I will update the example to better match the complexity of the task, while still keeping it small.
Because of the recursive calls it's necessary to use (DEFINE) instead of interpolating $vars
The trade-offs I see is that with one big regex it's easier to see at a glance how the grammar rules are connected, especially with syntax highlighting, whereas with splitted regexes it's possible to set breakpoints in specific grammar rules for debugging and it's not such an esoteric solution, so it's better suited for writing widely readable code.

For reference, I will explain my understanding of the embedded code in the big regex for parsing JSON. I will describe the data structure returned by the embedded code as a tree to make the explanation clearer:

After returning from each rule, the payload (result of the last embedded code expression) is set to the previous payload in the left subtree ($^R->[0]) and the rule's payload in the right subtree ($^R->1)
In order to make this invariant hold:

the payload for a rule with 0 or more matches of a subrule must be set at the rule's beginning for the 0-matches case
when combining payloads from multiple subrule matches (from a Kleene star or from a rule having multiple subrules), the right branches of the tree must be combined in the right subtree and the left subtree is set to the only left branch in the tree

Comment on Re^2: Positions of certain tokens in syntax tree (updated)

Replies are listed 'Best First'.
Re^3: Positions of certain tokens in syntax tree (updated) by LanX (Saint) on Dec 16, 2019 at 23:57 UTC
Hi sorry I'm too busy right now to dive deeper into your thoughts, but > When I update the post with a clear desired outcomes and tests, I will update the example to better match the complexity of the task, while still keeping it small. Please don't. If you want to be seen, either reply to the OP or start a new thread while referencing this. BTW: Merlyn's JSON parser has embedded Perl code for debugging, it was just out-commented. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]

Replies are listed 'Best First'.

Re^3: Positions of certain tokens in syntax tree (updated)
by LanX (Saint) on Dec 16, 2019 at 23:57 UTC

sorry I'm too busy right now to dive deeper into your thoughts, but

> When I update the post with a clear desired outcomes and tests, I will update the example to better match the complexity of the task, while still keeping it small.

Please don't.

If you want to be seen, either reply to the OP or start a new thread while referencing this.

BTW: Merlyn's JSON parser has embedded Perl code for debugging, it was just out-commented.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

Wikisyntax for the Monastery
FootballPerl is like chess, only without the dice}

[reply]