http://qs1969.pair.com?node_id=555986

tomazos has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to work out how to use Parse::RecDescent and am a little overwhelmed.

I can see how to use it as a recognizer, but to do a simple syntax-directed translation, all of the *items, $return, $text, real return value, etc, stuff has thrown me off.

Let's say you want a translator for languages over ('a','b','c','d','e') such that any balanced pairs of equal length sequences of a's and b's are replaced by c's and d's.

my $start = "eeeeaaaabbbeeee"; my $end = translate($start); # $end eq "eeeeacccdddeeee" (aaabbb -> cccddd)

(Side note: This is the "classic example" of something that can't be done with Update: regular grammars, and needs a context-free grammar. This is because within the regexp /(a*)(b*)/ there is no way to assert that length($1) == length($2).)

The syntax-directed translation (in pseudo-code) would be:

start -> part(s) { start.t := join ('', part(s).t) } part -> AnB { part.t := AnB.t } part -> 'a' { part.t := 'a' } part -> /[^a]+/ { part.t := /[^a]+/.t } AnB -> 'a' AnB 'b' { AnB.t := 'c' . AnB.t . 'd' } AnB -> 'ab' { AnB.t := 'cd' }

Any ideas on how this translates into Parse::RecDescent? Is there a more appropriate parsing module to use for cases where the input language is very similiar to the output language?

-Andrew.

Replies are listed 'Best First'.
Re: Parse::RecDescent for simple syntax-directed translation
by Limbic~Region (Chancellor) on Jun 18, 2006 at 15:16 UTC
    tomazos,
    (This is the "classic example" of something that can't be done with regular expressions, and needs a context-free grammar. This is because within the regexp /(a*)(b*)/ there is no way to assert that length($1) == length ($2). Anyhoo...)

    While this is probably strictly true, Perl is all about letting you get the job done:

    my $str = "eeeeaaaabbbeeee"; $str =~ s/((a+)(??{'b'x length$2}))/'c' x (length($1) * .5) . 'd' x (l +ength($1) * .5)/e;

    Anyhoo...I have just recently started learning Parse::RecDescent. Update: I am not sure if this is what you had in mind, but the following accomplishes what you want without using sneaky experimental regex features.

    #!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; $Parse::RecDescent::skip = ''; my $grammar = q{ match : PREFIX TOKEN SUFFIX {print join '', @item[1..3]} PREFIX : /.*?(?=a+b+)/ TOKEN : /a+b+/ { my $str = $item[1]; my $a_cnt = $str =~ tr/a//; my $b_cnt = $str =~ tr/b//; if ($a_cnt == $b_cnt) { $return = ('c' x $a_cnt) . ('d' x $b_cnt); } elsif ($a_cnt > $b_cnt) { $return = ('a' x ($a_cnt - $b_cnt)) . ('c' x $b_c +nt) . ('d' x $b_cnt); } else { $return = ('c' x $a_cnt) . ('d' x $a_cnt) . ('b' +x ($b_cnt - $a_cnt)); } } SUFFIX : /.*$/ }; my $parser = Parse::RecDescent->new($grammar); $parser->match('sing aaaaaabbb song');
    A lot of this code could be simplified and improved. I am neither a regex nor Parse::RecDescent guru. I did show how either could work.

    Once you have a string of a's followed by one or more b's ($item[1]), you only needed to calculate the desired string and assign it to $return. An explicit assignment to $return is not necessary as you could just let the last expression be returned as with Perl's subroutines.

    Cheers - L~R

      Thanks for answering my question. The regex solution is cool.

      I should have qualified my statement by saying that standard regular grammars cannot handle this sort of pattern, whereas Perl's regexes can do everything and anything.

      In fact embedded actions within a Perl regex can do anything Perl can do - Therefore Perl regex's can do anything Perl can do. :)

      I guess I can see from your use of Parse::RecDescent, Update: And from re-reading the very long manual last night, that the answer to my question is something like:

      my $grammar = q{ match : part(s) { print join('', @{$item[1]}) } part : AnB part : 'a' part : /[^a]+/ AnB : 'a' AnB 'b' { 'c' . $item[2] . 'd' } AnB : 'ab' { 'cd' } }

      -Andrew.

        Close. Your grammar strips out all whitespace. Replace
        match : part(s) { print join('', @{$item[1]}) }
        with
        match : <skip:''> part(s) { print join('', @{$item[2]}) }

        A slight improvement is to replace
        match : <skip:''> part(s) { print join('', @{$item[2]}) }
        with
        process : <skip:''> part(s) { join('', @{$item[2]}) }
        so you can do
        $filter = Parse::RecDescent->new($grammar);
        print $filter->process('eeeeaaaabbbeeee');