in reply to Using Perl6 patterns/grammar definition for 'output'?

I think/hope that Perl6 will have adequate introspection. But you'd have to ensure that a match-object contains all the information needed to fully constrain the generated output. Consider
rule foo:w { $a := (\d+) = $b := (\w+) [ , | and ] $c := (\w+) }; dump( rule=>"foo"; a=>7, b=>"hello", c=>"world" );
After substituting the values into the captures we then are left with:
\s+ 7 \s+ hello [ \s* , \s* | \s+ and \s+ ] world
So the dumper now needs to make two sets of decisions: what whitespace to substitute for the \ss; and whether to use a comma or the word "and" as the final separator. The whitespace issue could be defined as a standard pretty-print policy. But in the absence of any further hints, the dump function can probably do no better than choose an arbitrary (e.g. always first) member of an uncaptured alternation

So yes, I definitely think it could be done; but we may want a mechanism to set the policy for the unconstrained parts of a rule. A simple set of rules, such as "generate as little text as possible"; and "choose first option on alternations" would probably be enough for simple expressions. But what about look-ahead assertions; or embedded code: these things would be much harder to define simple rules for. Perhaps our dump function would require anything non-trivial to be captured.

An XML grammar would probably be a nice thing to use to play with the idea: more complex than your pack/unpack (BTW: I like that idea); but much simpler than, say, a full Perl6 grammar. --Dave

Replies are listed 'Best First'.
Re: Re: Using Perl6 patterns/grammar definition for <I>output</I>?
by John M. Dlugosz (Monsignor) on Sep 05, 2002 at 14:49 UTC
    I was thinking that a derived grammar would be unambiguous about the output format, but you would reuse many useful grammar rules.

    Instead of $a := (\d+), write  $a := <number_token> where number_token is itself defined as you show, or more complex such as (0|0[xX])\d+. Then, a derived grammar would override number_token to be something that formats, as opposed to something that accepts flexible input. Some standard mechanism would say "here is the string to be emitted" within the {} block of the rule.

    Terminals, by definition, would need to contain code to format the output. Nonterminals just state the arrangement of terminals, and in the absence of alternation and repitition (push that down to a lower level rule) would be used as-is.

    —John

      Using named rules to key formatting sounds like a good idea; but I'm not sure that I like the idea of using a "derived grammar" to represent the formatting of these named rules: it seems beeter to introduce an explicit formatting object. This formatting object would provide information to a visitor that traverses a parse tree (i.e. a match object plus its grammar); but it would not have the hierarchical structure of the grammar.

      Using a separate formating object has several advantages. Firstly, it avoids the parallel hierachies maintenance problem (you want formatting to be robust against modifications to the grammar). Even if you are only overriding the terminal rules of the grammar, you still have the issue that a change to the base grammar would be obscured in the derived grammar.

      A more important issue is that it will be common to import terminal symbols from other grammars (e.g. CORE::*). How many grammars to you want to override, just to format some output?

      I see a formatting object as a simple map of rule_name => format spec; with some mechanism for composition. This doesn't need to follow the hierarchy of a grammer:

      my $format = new Format ( ws => { col >= 70 ? "\n" : " " }, currency => { sprintf("%.2f", $currency) }, date => {sprintf("%04d-%02d-%02d", @$date{qw/year month day/})}, ); $text =~ /<grammar.rule>/; $format.output($0; grammar=>"grammar.rule", fh=>$*STDOUT)
      The output method would traverse the grammar; each time it meets a rule that has a format, it applies that format to the corresponding entry in the match object ($0). This does not require a derived grammar: just an extra specification object. Sometimes composition is better than inheritance.

      --Dave

        I see: instead of having formatting as a side-effect of the grammar rules, you would supply separate formatting rules that match by name. If there is a match, it uses that and doesn't look at the corresponding grammar rule.

        I think the way you write it should certainly be possible, meaning that the grammar object should be traversible enough to allow format.output to be written in Perl.

        I like that how the return value is the text to output, but I don't see how $date, $currency, etc. would become implicit like they are in the grammar. If the populated $0 becomes the current topic when a call is made, that would be easy enough: $.currency, right?

        —John