comment on

That's a good suggestion, I'll experiment with the alternatives a bit and see if I can come up with a good example including tests and links. I'll still have to flesh out the desired outcome. I assume that many refactorings could be implemented by just knowing the AST annotated with source positions and I would like to try to write something like that in Perl. Now I notice that using substring references actually wouldn't be a good idea, because assigning a different-length string to one of the references would invalidate the other references in that they wouldn't point to the token any more. I'll see if I can find out how refactoring tools in major IDEs do this, I'm not sure if they keep code formatting intact or if they just pretty-print the new AST in a canonical way.

About your thoughts:

While the example doesn't require recursive parsing, the Go grammar does require some recursive parsing, for example a statement parser can invoke a block parser, which then again invokes the statement parser. When I update the post with a clear desired outcomes and tests, I will update the example to better match the complexity of the task, while still keeping it small.
Because of the recursive calls it's necessary to use (DEFINE) instead of interpolating $vars
The trade-offs I see is that with one big regex it's easier to see at a glance how the grammar rules are connected, especially with syntax highlighting, whereas with splitted regexes it's possible to set breakpoints in specific grammar rules for debugging and it's not such an esoteric solution, so it's better suited for writing widely readable code.

For reference, I will explain my understanding of the embedded code in the big regex for parsing JSON. I will describe the data structure returned by the embedded code as a tree to make the explanation clearer:

After returning from each rule, the payload (result of the last embedded code expression) is set to the previous payload in the left subtree ($^R->[0]) and the rule's payload in the right subtree ($^R->1)
In order to make this invariant hold:

the payload for a rule with 0 or more matches of a subrule must be set at the rule's beginning for the 0-matches case
when combining payloads from multiple subrule matches (from a Kleene star or from a rule having multiple subrules), the right branches of the tree must be combined in the right subtree and the left subtree is set to the only left branch in the tree

In reply to Re^2: Positions of certain tokens in syntax tree (updated) by rubystallion
in thread Positions of certain tokens in syntax tree by rubystallion

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.