loomis53 has asked for the wisdom of the Perl Monks concerning the following question:

Taking an example grammar from the FAQ:
my $grammar = q{ list: <leftop: item ',' item> item: word list <commit> | word word: /\w+/ };
You can parse a nested list like a, b, (c, d (e,f)) I want to deviate from this and enclose the start of the list in parens and return all of the entire text string between the parens, with an array ref for nested parens. So the text: (string one(string two)string three) Returns ['string one', ['string two'],'string three'] I think that this grammar does what I want, but am not sure if it's the best solution:
my $grammar = q{ list: '(' item(s) ')' item: list <commit> | word word: /[^()]*/ };
I would like to know if there is a better way to do this. Also, with this approach I cannot have any parens in my text strings within parens. If I wanted to be able to escape parens and include them within the text, how would I go about doing it?

Replies are listed 'Best First'.
Re: Parse::RecDescent - I'm just not getting it
by ikegami (Patriarch) on May 30, 2006 at 18:07 UTC

    That's fine except for the incorrect return values

    my $grammar = q{ list : '(' item(s) ')' { $item[2] } item : list <commit> { $item[1] } | word word : /[^()]*/ };

    I don't like using q{...} for grammars. Here-docs provide the same functionality, but have fewer issues.

    my $grammar = <<'__END_OF_GRAMMAR__'; list : '(' item(s) ')' { $item[2] } item : list <commit> { $item[1] } | word word : /[^()]*/ __END_OF_GRAMMAR__

    I strongly recommend the use of use strict and use warnings in your grammar. The ones in your main program won't affect the code in your grammar.

    my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } list : '(' item(s) ')' { $item[2] } item : list <commit> { $item[1] } | word word : /[^()]*/ __END_OF_GRAMMAR__

    Finally, I'm not sure if trailing <commit> directives are of any use. I'll have to look into that.

    Update: If the outside parens are optional, you get:

    list : item(s) item : '(' <commit> list ')' { $item[3] } | word word : /[^()]*/

    And if you don't want empty strings:

    list : item(s?) item : '(' <commit> list ')' { $item[3] } | word word : /[^()]+/
Re: Parse::RecDescent - I'm just not getting it
by blokhead (Monsignor) on May 30, 2006 at 17:57 UTC
    Try these changes:
    list: '(' item(s) ')' { $item[2] } item: list <commit> { $item[1] } | word word: / (?: [^()\\] | \\ . )+ /x
    To have the list nonterminal actually return the arrayref, you need the semantic action to return $item[2]. Otherwise it would return the right paren every time.

    The updated "word" regex says to repeatedly try to match either a normal char, or a backslash-escaped char.

    blokhead